Elasticsearch 8.8 原生向量检索性能测试

说明

本文描述问题及解决方法同样适用于 腾讯云 Elasticsearch Service（ES）。

另外使用到：腾讯云云服务器（Cloud Virtual Machine，CVM）

环境配置

vespa-fbench 客户端环境

版本

Linux环境：Centos 7.9

Python：3.8.7

Pip：pip 20.2.3 from pip (python 3.8)

Java：openjdk version 1.8.0_302 (build 1.8.0_302-b08)

Git：2.7.5

配置

内存：32G

硬盘：增强型SSD云硬盘 50GB

CPU个数：1

CPU核心数：32

Elasticsearch 服务端环境

版本

Linux环境：Centos 7.9

Java：openjdk version 11.0.9.1-ga (build 11.0.9.1-ga 1, mixed mode)

Elasticsearch版本：8.8.1（腾讯云 Elasticsearch Service 白金版）

配置

节点数量：3

内存：128G

硬盘：本地NVMe SSD盘 3.5T * 2

CPU个数：1

CPU核心数：32

CPU型号：Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz

背景

腾讯云大数据Elasticsearch Service首发上线 ES 8.8.1 版本，提供强大的云端AI增强与向量检索能力，支持在端到端搜索与分析平台中实现自然语言处理、向量搜索以及与大模型的集成，10亿级向量检索平均响应延迟控制在毫秒级，助力客户实现由AI驱动的高级搜索能力，为搜索与分析带来全新的前沿体验。本⽂主要介绍使⽤ vespa-fbench 压测工具进行 ES 8.8 的向量检索性能压测。

压测信息

数据集

本篇文档中使用到 GIST 数据集，这个数据集在评估 ANN 的性能和准确性时经常使用，数据集来源 ann-benchmarks。

ES index schema

索引信息基于 index.json 调整:

代码语言：javascript复制

{
  "settings": {
    "number_of_shards": 6,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "vector": {
          "type": "dense_vector",
          "dims": 960,             // 最高支持2048维度
          "index": true,
          "similarity": "cosine",  // 支持 cosine, dot_product, l2_norm
          "element_type": "float", // 支持 float, byte 
          "index_options": {       // hnsw 高级参数配置
            "type": "hnsw",
            "m": 16,
            "ef_construction": 100
          }
      }
    }
  }
}

压测请求示例

共1000条query压测语句，下面是其中一条：

代码语言：javascript复制

/doc_knn/_search
{"size": 10, "timeout": "15s", "_source": {"exclude": ["vector"]}, "knn": [{"field": "vector", "query_vector": [0.011699999682605267, 0.011500000022351742, 0.008700000122189522, 0.009999999776482582, 0.07850000262260437, 0.10000000149011612, 0.07840000092983246, 0.05299999937415123, 0.052400000393390656, 0.08190000057220459, 0.0658000037074089, 0.057999998331069946, 0.01590000092983246, 0.017000000923871994, 0.04610000178217888, 0.02419999986886978], "k": 10, "num_candidates": 100, "boost": 1}]}

压测结果

Clients	QPS	Average Latency (ms)	95P Latency (ms)	CPU uitl (ms)
100	100.03	10.95	13.70	5
300	300.12	11.39	14.20	17
500	500.21	11.94	14.70	32
700	700.29	12.50	15.40	45
900	900.38	14.21	22.50	59
1200	1200.52	23.21	52.10	79
1300	1300.56	61.32	266.90	87
1400	1400.57	210.46	730.00	98

可以看到，在 CPU 使用率 80% 以下时，请求的耗时还是比较低的，一旦 CPU 使用率超过80%，耗时则会大幅上升。

Benchmark 参数

代码语言：javascript复制

# Clients 100
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 100 -c 1000 -i 20 -o /tmp/result.esknn_100.txt 10.0.0.12 9200
# Clients 300
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 300 -c 1000 -i 20 -o /tmp/result.esknn_300.txt 10.0.0.12 9200
# Clients 500
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 500 -c 1000 -i 20 -o /tmp/result.esknn_500.txt 10.0.0.12 9200
# Clients 700
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 700 -c 1000 -i 20 -o /tmp/result.esknn_700.txt 10.0.0.12 9200
# Clients 900
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 900 -c 1000 -i 20 -o /tmp/result.esknn_900.txt 10.0.0.12 9200
# Clients 1200
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 1200 -c 1000 -i 20 -o /tmp/result.esknn_1200.txt 10.0.0.12 9200
# Clients 1300
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 1300 -c 1000 -i 20 -o /tmp/result.esknn_1300.txt 10.0.0.12 9200
# Clients 1400
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 1400 -c 1000 -i 20 -o /tmp/result.esknn_1400.txt 10.0.0.12 9200

参数说明

代码语言：javascript复制

-s 180：运行时间为180秒，默认为 60，-1 代表永远
-n 1500：1500个客户端进行并发搜索，默认为 10
-c 0：不等待客户端返回结果，直接发送下一个查询请求，默认为 1000，建议留空
-i 20：在前20个查询中忽略延迟（即不计入性能测试结果），以便进行预热，默认为 0
-q：指定查询文件，由make-queries.py生成
-P：使用HTTP POST方法发送请求
-H：指定POST消息体的头信息，为JSON格式

特别注意

由于 vespa-fbench 不支持参数或者配置指定http的认证信息，所以当我们的ES集群有身份认证时，则需要在压测命令的请求头中加入认证信息。

代码语言：javascript复制

/opt/vespa/bin/vespa-fbench -P -H "Content-Type:application/json" -H "Authorization: Basic $(echo -n 'elastic:changeme' | base64)" -q data/elastic/knn_queries.txt -s 180 -n 1500 -c 1000 -i 20 -o /tmp/result.esknn_1500.txt 10.0.0.12 9200

压测用例

1. 安装压测工具 vespa-fbench

代码语言：javascript复制

# 添加yum源仓库
[root@centos ~]# yum-config-manager --add-repo 
https://copr.fedorainfracloud.org/coprs/g/vespa/vespa/repo/epel-7/group_vespa-vespa-epel-7.repo
[root@centos ~]# yum -y install epel-release centos-release-scl
# 安装vespa
[root@centos ~]# yum -y install vespa

安装完之后，会在/opt/vespa/bin目录下面成可执行文件，我们需要的执行命令是vespa-fbench

2. 克隆项目 dense-vector-ranking-performance

我们需要在ES集群中创建需要压测的索引并导入数据集，以及生成压测的请求

代码语言：javascript复制

[root@centos ~]# git clone https://github.com/jobergum/dense-vector-ranking-performance.git
Cloning into 'dense-vector-ranking-performance'...
remote: Enumerating objects: 149, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 149 (delta 0), reused 0 (delta 0), pack-reused 147
Receiving objects: 100% (149/149), 532.09 MiB | 725.00 KiB/s, done.
Resolving deltas: 100% (56/56), done.
[root@centos ~]# cd dense-vector-ranking-performance
[root@centos dense-vector-ranking-performance]# ll
total 52
drwxr-xr-x 5 root root  4096 May 10 13:45 bin
drwxr-xr-x 5 root root  4096 May 10 13:45 config
drwxr-xr-x 5 root root  4096 May 10 13:45 data
-rw-r--r-- 1 root root   187 May 10 13:45 Dockerfile.elastic
-rw-r--r-- 1 root root   344 May 10 13:45 Dockerfile.opendistroforelasticsearch
-rw-r--r-- 1 root root   102 May 10 13:45 Dockerfile.vespa
-rw-r--r-- 1 root root 11357 May 10 13:45 LICENSE
-rw-r--r-- 1 root root 15017 May 10 13:45 README.md
[root@centos ~]#

3. 准备数据集 GIST

由于数据集在海外，该数据集下载耗时将1天以上。

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# wget http://ann-benchmarks.com/gist-960-euclidean.hdf5

为了方便下载，我已经将数据集分卷上传至CSDN，可自行下载：

Part1：gist-960-euclidean.zip.001

Part2：gist-960-euclidean.zip.002

4. 修改配置

dense-vector-ranking-performance默认使用的是本地环境进行配置的生成，而我们需要对现有的服务器进行压测，所以需要修改配置以达到目的。

需要创建2个文件，以及修改3个文件。

4.1 创建文件 config/elastic/index_knn.json

定义压测索引的属性：

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# cat config/elastic/index_knn.json 
{
  "settings": {
    "index": {
      "refresh_interval": "10s",
      "number_of_shards": "6"
    }
  },
  "mappings": {
      "properties": {
        "postTime": {
          "index": false,
          "type": "date"
        },
        "vector": {
           "type": "dense_vector",
           "similarity": "cosine",   // 支持 cosine, dot_product, l2_norm
           "index": true,
           "dims": 960,              // 最高支持2048维度    
           "element_type": "float",  // 支持 float, byte 
           "index_options": {        // hnsw 高级参数配置
              "type": "hnsw",
              "m": 16,
              "ef_construction": 100
            }
        },
        "id": {
          "type": "keyword"
        }
      }
  }
}
[root@centos dense-vector-ranking-performance]#

4.2 创建文件 bin/elastic/create_knn-index.sh

引用索引创建属性进行索引创建：

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# cat bin/elastic/create_knn-index.sh
#!/bin/sh
curl -uelastic:password -s -X PUT "http://10.0.0.12:9200/doc_knn?pretty" -H "Content-Type:application/json" -d @config/elastic/index_knn.json
[root@centos dense-vector-ranking-performance]#

4.3 修改 bin/make-feed.py

数据集导入：

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# cat bin/make-feed.py
import h5py
import sys
import concurrent.futures 
import requests

file= sys.argv[1]
train = h5py.File(file, 'r')['train']
username = 'elastic'
password = 'password'

def feed_to_es_and_vespa(data):
  docid,vector = data
  vector = vector.tolist()

  vespa_body = {
    "fields": {
      'vector': {
        'values': vector 
      },
      'id': docid
    }
  }
  es_body={
    'id': docid,
    'vector': vector 
  }

  auth = requests.auth.HTTPBasicAuth(username, password)
  response = requests.post('http://10.0.0.12:9200/doc_knn/_doc/%i' %docid, json=es_body, auth=auth)
  response.raise_for_status()

nthreads=32
with concurrent.futures.ThreadPoolExecutor(max_workers=nthreads) as executor:
  futures = [executor.submit(feed_to_es_and_vespa,data) for data in enumerate(train)]
  for result in concurrent.futures.as_completed(futures):
    pass

[root@centos dense-vector-ranking-performance]#

4.4 修改 bin/make-queries.py

生成用于压测的query请求体文件：

代码语言：javascript复制

[root@centos dense-vector-ranking-performance-master]# cat bin/make-queries.py 
import numpy as np
import json
import h5py
import sys

file= sys.argv[1]
test= h5py.File(file, 'r')['test']

esknn_queries = open('data/elastic/knn_queries.txt', 'w')

for v in test:
  query_vector = v.tolist()

  esknn_script_query = [
    {
      'field': 'vector',
      'query_vector': query_vector,
      'k': 10,
      'num_candidates': 100,
      'boost': 1
    }
  ]

  esknn_body = {
    'size': 10,
    'timeout': '15s',
    '_source': {
      'exclude': [
        'vector'
      ]
    },
    'knn': esknn_script_query
  }
  esknn_queries.write('/doc_knn/_searchn')
  esknn_queries.write(json.dumps(esknn_body)   'n')

[root@centos dense-vector-ranking-performance]#

5. 导入数据集并生成压测请求体文件

5.1 创建索引

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# bash create_knn-index.sh 
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "doc_knn"
}

5.2 导入数据集

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# python3 ./bin/make-feed.py gist-960-euclidean.hdf5

5.3 segment合并

数据集导入完成之后，进行一次forcemerge，便于压测

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# curl -XPOST -s '10.0.0.12:9200/doc_knn/_forcemerge?max_num_segments=1'

5.4 生成压测请求体文件

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# python3 ./bin/make-queries.py gist-960-euclidean.hdf5

6. 运行benchmark

代码语言：javascript复制

[root@centos dense-vector-ranking-performance]# /opt/vespa/bin/vespa-fbench -P -H Content-Type:application/json -q data/elastic/knn_queries.txt -s 180 -n 100 -c 1000 -i 20 -o /tmp/result.esknn_100.txt 10.0.0.12 9200

附录

压测明细：

代码语言：javascript复制

Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
....................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 17883
***************** Benchmark Summary *****************
clients:                     100
ran for:                     180 seconds
cycle time:                 1000 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:       15983
cycles not held:               0
minimum response time:      6.24 ms
maximum response time:     95.72 ms
average response time:     10.95 ms
25   percentile:              9.70 ms
50   percentile:             10.90 ms
75   percentile:             11.90 ms
90   percentile:             13.10 ms
95   percentile:             13.70 ms
98   percentile:             14.20 ms
99   percentile:             14.60 ms
99.5 percentile:             15.10 ms
99.6 percentile:             15.30 ms
99.7 percentile:             16.31 ms
99.8 percentile:             20.71 ms
99.9 percentile:             61.10 ms
actual query rate:        100.03 Q/s
utilization:                1.10 %
zero hit queries:              0
zero hit percentage:        0.00 %
http request status breakdown:
       200 :    17983 
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
............................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 53690
***************** Benchmark Summary *****************
clients:                     300
ran for:                     180 seconds
cycle time:                 1000 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:       47990
cycles not held:               0
minimum response time:      6.16 ms
maximum response time:     23.49 ms
average response time:     11.39 ms
25   percentile:             10.10 ms
50   percentile:             11.50 ms
75   percentile:             12.70 ms
90   percentile:             13.70 ms
95   percentile:             14.20 ms
98   percentile:             14.70 ms
99   percentile:             15.00 ms
99.5 percentile:             15.50 ms
99.6 percentile:             15.70 ms
99.7 percentile:             16.40 ms
99.8 percentile:             17.80 ms
99.9 percentile:             19.40 ms
actual query rate:        300.12 Q/s
utilization:                1.14 %
zero hit queries:              0
zero hit percentage:        0.00 %
http request status breakdown:
       200 :    53990 
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 89449
***************** Benchmark Summary *****************
clients:                     500
ran for:                     180 seconds
cycle time:                 1000 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:       79949
cycles not held:               0
minimum response time:      5.97 ms
maximum response time:     63.04 ms
average response time:     11.94 ms
25   percentile:             10.60 ms
50   percentile:             12.10 ms
75   percentile:             13.40 ms
90   percentile:             14.30 ms
95   percentile:             14.70 ms
98   percentile:             15.20 ms
99   percentile:             15.70 ms
99.5 percentile:             17.00 ms
99.6 percentile:             18.10 ms
99.7 percentile:             19.00 ms
99.8 percentile:             20.00 ms
99.9 percentile:             21.50 ms
actual query rate:        500.21 Q/s
utilization:                1.19 %
zero hit queries:              0
zero hit percentage:        0.00 %
http request status breakdown:
       200 :    89949 
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 125241
***************** Benchmark Summary *****************
clients:                     700
ran for:                     180 seconds
cycle time:                 1000 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:      111941
cycles not held:               0
minimum response time:      6.26 ms
maximum response time:     73.44 ms
average response time:     12.50 ms
25   percentile:             11.10 ms
50   percentile:             12.60 ms
75   percentile:             13.90 ms
90   percentile:             14.80 ms
95   percentile:             15.40 ms
98   percentile:             17.00 ms
99   percentile:             20.00 ms
99.5 percentile:             22.20 ms
99.6 percentile:             22.70 ms
99.7 percentile:             23.40 ms
99.8 percentile:             24.50 ms
99.9 percentile:             27.10 ms
actual query rate:        700.29 Q/s
utilization:                1.25 %
zero hit queries:              0
zero hit percentage:        0.00 %
http request status breakdown:
       200 :   125941 
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 161021
***************** Benchmark Summary *****************
clients:                     900
ran for:                     180 seconds
cycle time:                 1000 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:      143921
cycles not held:               0
minimum response time:      6.14 ms
maximum response time:    331.34 ms
average response time:     14.21 ms
25   percentile:             11.50 ms
50   percentile:             13.20 ms
75   percentile:             14.80 ms
90   percentile:             18.60 ms
95   percentile:             22.50 ms
98   percentile:             26.90 ms
99   percentile:             30.80 ms
99.5 percentile:             35.90 ms
99.6 percentile:             37.90 ms
99.7 percentile:             43.90 ms
99.8 percentile:            100.52 ms
99.9 percentile:            186.46 ms
actual query rate:        900.38 Q/s
utilization:                1.42 %
zero hit queries:              0
zero hit percentage:        0.00 %
http request status breakdown:
       200 :   161921 
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 214707
***************** Benchmark Summary *****************
clients:                    1200
ran for:                     180 seconds
cycle time:                 1000 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:      191907
cycles not held:               0
minimum response time:      6.21 ms
maximum response time:    123.93 ms
average response time:     23.21 ms
25   percentile:             12.70 ms
50   percentile:             18.10 ms
75   percentile:             30.10 ms
90   percentile:             43.60 ms
95   percentile:             52.10 ms
98   percentile:             62.40 ms
99   percentile:             70.10 ms
99.5 percentile:             77.30 ms
99.6 percentile:             79.60 ms
99.7 percentile:             82.70 ms
99.8 percentile:             86.40 ms
99.9 percentile:             93.20 ms
actual query rate:       1200.52 Q/s
utilization:                2.32 %
zero hit queries:              0
zero hit percentage:        0.00 %
http request status breakdown:
       200 :   215907 
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 250505
***************** Benchmark Summary *****************
clients:                    1400
ran for:                     180 seconds
cycle time:                 1000 ms
lower response limit:          0 bytes
skipped requests:              0
failed requests:               0
successful requests:      223905
cycles not held:              62
minimum response time:      6.39 ms
maximum response time:   1246.07 ms
average response time:    210.46 ms
25   percentile:             21.30 ms
50   percentile:             94.00 ms
75   percentile:            351.40 ms
90   percentile:            618.90 ms
95   percentile:            730.00 ms
98   percentile:            806.40 ms
99   percentile:            841.90 ms
99.5 percentile:            866.30 ms
99.6 percentile:            873.68 ms
99.7 percentile:            883.33 ms
99.8 percentile:            902.02 ms
99.9 percentile:            932.71 ms
actual query rate:       1400.57 Q/s
utilization:               21.05 %
zero hit queries:              0
zero hit percentage:        0.00 %
http request status breakdown:
       200 :   244665 
       429 :     7240

大数据性能测试 ElasticsearchService elasticsearch向量检索 AI 大模型

0 人点赞