说明
本文描述问题及解决方法同样适用于 腾讯云 Elasticsearch Service(ES)。
另外使用到:腾讯云 云服务器(Cloud Virtual Machine,CVM)
环境配置
vespa-fbench 客户端环境
- 版本
Linux环境:Centos 7.9
Python:3.8.7
Pip:pip 20.2.3 from pip (python 3.8)
Java:openjdk version 1.8.0_302 (build 1.8.0_302-b08)
Git:2.7.5
配置
内存:32G
硬盘:增强型SSD云硬盘 50GB
CPU个数:1
CPU核心数:32
Elasticsearch 服务端环境
- 版本
Linux环境:Centos 7.9
Java:openjdk version 11.0.9.1-ga (build 11.0.9.1-ga 1, mixed mode)
Elasticsearch版本:8.8.1(腾讯云 Elasticsearch Service 白金版)
- 配置
节点数量:3
内存:128G
硬盘:本地NVMe SSD盘 3.5T * 2
CPU个数:1
CPU核心数:32
CPU型号:Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
背景
腾讯云大数据Elasticsearch Service首发上线 ES 8.8.1 版本,提供强大的云端AI增强与向量检索能力,支持在端到端搜索与分析平台中实现自然语言处理、向量搜索以及与大模型的集成,10亿级向量检索平均响应延迟控制在毫秒级,助力客户实现由AI驱动的高级搜索能力,为搜索与分析带来全新的前沿体验。本⽂主要介绍使⽤ vespa-fbench 压测工具进行 ES 8.8 的向量检索性能压测。
压测信息
数据集
本篇文档中使用到 GIST 数据集,这个数据集在评估 ANN 的性能和准确性时经常使用,数据集来源 ann-benchmarks。
ES index schema
索引信息基于 index.json 调整:
代码语言:javascript复制{
"settings": {
"number_of_shards": 6,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"vector": {
"type": "dense_vector",
"dims": 960, // 最高支持2048维度
"index": true,
"similarity": "cosine", // 支持 cosine, dot_product, l2_norm
"element_type": "float", // 支持 float, byte
"index_options": { // hnsw 高级参数配置
"type": "hnsw",
"m": 16,
"ef_construction": 100
}
}
}
}
}
压测请求示例
共1000条query压测语句,下面是其中一条:
代码语言:javascript复制/doc_knn/_search
{"size": 10, "timeout": "15s", "_source": {"exclude": ["vector"]}, "knn": [{"field": "vector", "query_vector": [0.011699999682605267, 0.011500000022351742, 0.008700000122189522, 0.009999999776482582, 0.07850000262260437, 0.10000000149011612, 0.07840000092983246, 0.05299999937415123, 0.052400000393390656, 0.08190000057220459, 0.0658000037074089, 0.057999998331069946, 0.01590000092983246, 0.017000000923871994, 0.04610000178217888, 0.02419999986886978], "k": 10, "num_candidates": 100, "boost": 1}]}
压测结果
Clients | QPS | Average Latency (ms) | 95P Latency (ms) | CPU uitl (ms) |
---|---|---|---|---|
100 | 100.03 | 10.95 | 13.70 | 5 |
300 | 300.12 | 11.39 | 14.20 | 17 |
500 | 500.21 | 11.94 | 14.70 | 32 |
700 | 700.29 | 12.50 | 15.40 | 45 |
900 | 900.38 | 14.21 | 22.50 | 59 |
1200 | 1200.52 | 23.21 | 52.10 | 79 |
1300 | 1300.56 | 61.32 | 266.90 | 87 |
1400 | 1400.57 | 210.46 | 730.00 | 98 |
可以看到,在 CPU 使用率 80% 以下时,请求的耗时还是比较低的,一旦 CPU 使用率超过80%,耗时则会大幅上升。
Benchmark 参数
代码语言:javascript复制# Clients 100
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 100 -c 1000 -i 20 -o /tmp/result.esknn_100.txt 10.0.0.12 9200
# Clients 300
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 300 -c 1000 -i 20 -o /tmp/result.esknn_300.txt 10.0.0.12 9200
# Clients 500
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 500 -c 1000 -i 20 -o /tmp/result.esknn_500.txt 10.0.0.12 9200
# Clients 700
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 700 -c 1000 -i 20 -o /tmp/result.esknn_700.txt 10.0.0.12 9200
# Clients 900
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 900 -c 1000 -i 20 -o /tmp/result.esknn_900.txt 10.0.0.12 9200
# Clients 1200
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 1200 -c 1000 -i 20 -o /tmp/result.esknn_1200.txt 10.0.0.12 9200
# Clients 1300
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 1300 -c 1000 -i 20 -o /tmp/result.esknn_1300.txt 10.0.0.12 9200
# Clients 1400
vespa-fbench -P -H "Authorization: Basic $(echo -n 'elastic:password' | base64)" -H "Content-Type:application/json" -q data/elastic/knn_queries.txt -s 180 -n 1400 -c 1000 -i 20 -o /tmp/result.esknn_1400.txt 10.0.0.12 9200
参数说明
代码语言:javascript复制-s 180:运行时间为180秒,默认为 60,-1 代表永远
-n 1500:1500个客户端进行并发搜索,默认为 10
-c 0:不等待客户端返回结果,直接发送下一个查询请求,默认为 1000,建议留空
-i 20:在前20个查询中忽略延迟(即不计入性能测试结果),以便进行预热,默认为 0
-q:指定查询文件,由make-queries.py生成
-P:使用HTTP POST方法发送请求
-H:指定POST消息体的头信息,为JSON格式
特别注意
由于 vespa-fbench 不支持参数或者配置指定http的认证信息,所以当我们的ES集群有身份认证时,则需要在压测命令的请求头中加入认证信息。
代码语言:javascript复制/opt/vespa/bin/vespa-fbench -P -H "Content-Type:application/json" -H "Authorization: Basic $(echo -n 'elastic:changeme' | base64)" -q data/elastic/knn_queries.txt -s 180 -n 1500 -c 1000 -i 20 -o /tmp/result.esknn_1500.txt 10.0.0.12 9200
压测用例
1. 安装压测工具 vespa-fbench
代码语言:javascript复制# 添加yum源仓库
[root@centos ~]# yum-config-manager --add-repo
https://copr.fedorainfracloud.org/coprs/g/vespa/vespa/repo/epel-7/group_vespa-vespa-epel-7.repo
[root@centos ~]# yum -y install epel-release centos-release-scl
# 安装vespa
[root@centos ~]# yum -y install vespa
安装完之后,会在/opt/vespa/bin
目录下面成可执行文件,我们需要的执行命令是vespa-fbench
2. 克隆项目 dense-vector-ranking-performance
我们需要在ES集群中创建需要压测的索引并导入数据集,以及生成压测的请求
代码语言:javascript复制[root@centos ~]# git clone https://github.com/jobergum/dense-vector-ranking-performance.git
Cloning into 'dense-vector-ranking-performance'...
remote: Enumerating objects: 149, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 149 (delta 0), reused 0 (delta 0), pack-reused 147
Receiving objects: 100% (149/149), 532.09 MiB | 725.00 KiB/s, done.
Resolving deltas: 100% (56/56), done.
[root@centos ~]# cd dense-vector-ranking-performance
[root@centos dense-vector-ranking-performance]# ll
total 52
drwxr-xr-x 5 root root 4096 May 10 13:45 bin
drwxr-xr-x 5 root root 4096 May 10 13:45 config
drwxr-xr-x 5 root root 4096 May 10 13:45 data
-rw-r--r-- 1 root root 187 May 10 13:45 Dockerfile.elastic
-rw-r--r-- 1 root root 344 May 10 13:45 Dockerfile.opendistroforelasticsearch
-rw-r--r-- 1 root root 102 May 10 13:45 Dockerfile.vespa
-rw-r--r-- 1 root root 11357 May 10 13:45 LICENSE
-rw-r--r-- 1 root root 15017 May 10 13:45 README.md
[root@centos ~]#
3. 准备数据集 GIST
由于数据集在海外,该数据集下载耗时将1天以上。
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# wget http://ann-benchmarks.com/gist-960-euclidean.hdf5
为了方便下载,我已经将数据集分卷上传至CSDN,可自行下载:
Part1:gist-960-euclidean.zip.001
Part2:gist-960-euclidean.zip.002
4. 修改配置
dense-vector-ranking-performance默认使用的是本地环境进行配置的生成,而我们需要对现有的服务器进行压测,所以需要修改配置以达到目的。
需要创建2个文件,以及修改3个文件。
4.1 创建文件 config/elastic/index_knn.json
定义压测索引的属性:
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# cat config/elastic/index_knn.json
{
"settings": {
"index": {
"refresh_interval": "10s",
"number_of_shards": "6"
}
},
"mappings": {
"properties": {
"postTime": {
"index": false,
"type": "date"
},
"vector": {
"type": "dense_vector",
"similarity": "cosine", // 支持 cosine, dot_product, l2_norm
"index": true,
"dims": 960, // 最高支持2048维度
"element_type": "float", // 支持 float, byte
"index_options": { // hnsw 高级参数配置
"type": "hnsw",
"m": 16,
"ef_construction": 100
}
},
"id": {
"type": "keyword"
}
}
}
}
[root@centos dense-vector-ranking-performance]#
4.2 创建文件 bin/elastic/create_knn-index.sh
引用索引创建属性进行索引创建:
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# cat bin/elastic/create_knn-index.sh
#!/bin/sh
curl -uelastic:password -s -X PUT "http://10.0.0.12:9200/doc_knn?pretty" -H "Content-Type:application/json" -d @config/elastic/index_knn.json
[root@centos dense-vector-ranking-performance]#
4.3 修改 bin/make-feed.py
数据集导入:
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# cat bin/make-feed.py
import h5py
import sys
import concurrent.futures
import requests
file= sys.argv[1]
train = h5py.File(file, 'r')['train']
username = 'elastic'
password = 'password'
def feed_to_es_and_vespa(data):
docid,vector = data
vector = vector.tolist()
vespa_body = {
"fields": {
'vector': {
'values': vector
},
'id': docid
}
}
es_body={
'id': docid,
'vector': vector
}
auth = requests.auth.HTTPBasicAuth(username, password)
response = requests.post('http://10.0.0.12:9200/doc_knn/_doc/%i' %docid, json=es_body, auth=auth)
response.raise_for_status()
nthreads=32
with concurrent.futures.ThreadPoolExecutor(max_workers=nthreads) as executor:
futures = [executor.submit(feed_to_es_and_vespa,data) for data in enumerate(train)]
for result in concurrent.futures.as_completed(futures):
pass
[root@centos dense-vector-ranking-performance]#
4.4 修改 bin/make-queries.py
生成用于压测的query请求体文件:
代码语言:javascript复制[root@centos dense-vector-ranking-performance-master]# cat bin/make-queries.py
import numpy as np
import json
import h5py
import sys
file= sys.argv[1]
test= h5py.File(file, 'r')['test']
esknn_queries = open('data/elastic/knn_queries.txt', 'w')
for v in test:
query_vector = v.tolist()
esknn_script_query = [
{
'field': 'vector',
'query_vector': query_vector,
'k': 10,
'num_candidates': 100,
'boost': 1
}
]
esknn_body = {
'size': 10,
'timeout': '15s',
'_source': {
'exclude': [
'vector'
]
},
'knn': esknn_script_query
}
esknn_queries.write('/doc_knn/_searchn')
esknn_queries.write(json.dumps(esknn_body) 'n')
[root@centos dense-vector-ranking-performance]#
5. 导入数据集并生成压测请求体文件
5.1 创建索引
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# bash create_knn-index.sh
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "doc_knn"
}
5.2 导入数据集
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# python3 ./bin/make-feed.py gist-960-euclidean.hdf5
5.3 segment合并
数据集导入完成之后,进行一次forcemerge,便于压测
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# curl -XPOST -s '10.0.0.12:9200/doc_knn/_forcemerge?max_num_segments=1'
5.4 生成压测请求体文件
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# python3 ./bin/make-queries.py gist-960-euclidean.hdf5
6. 运行benchmark
代码语言:javascript复制[root@centos dense-vector-ranking-performance]# /opt/vespa/bin/vespa-fbench -P -H Content-Type:application/json -q data/elastic/knn_queries.txt -s 180 -n 100 -c 1000 -i 20 -o /tmp/result.esknn_100.txt 10.0.0.12 9200
附录
压测明细:
代码语言:javascript复制Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
....................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 17883
***************** Benchmark Summary *****************
clients: 100
ran for: 180 seconds
cycle time: 1000 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 15983
cycles not held: 0
minimum response time: 6.24 ms
maximum response time: 95.72 ms
average response time: 10.95 ms
25 percentile: 9.70 ms
50 percentile: 10.90 ms
75 percentile: 11.90 ms
90 percentile: 13.10 ms
95 percentile: 13.70 ms
98 percentile: 14.20 ms
99 percentile: 14.60 ms
99.5 percentile: 15.10 ms
99.6 percentile: 15.30 ms
99.7 percentile: 16.31 ms
99.8 percentile: 20.71 ms
99.9 percentile: 61.10 ms
actual query rate: 100.03 Q/s
utilization: 1.10 %
zero hit queries: 0
zero hit percentage: 0.00 %
http request status breakdown:
200 : 17983
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
............................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 53690
***************** Benchmark Summary *****************
clients: 300
ran for: 180 seconds
cycle time: 1000 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 47990
cycles not held: 0
minimum response time: 6.16 ms
maximum response time: 23.49 ms
average response time: 11.39 ms
25 percentile: 10.10 ms
50 percentile: 11.50 ms
75 percentile: 12.70 ms
90 percentile: 13.70 ms
95 percentile: 14.20 ms
98 percentile: 14.70 ms
99 percentile: 15.00 ms
99.5 percentile: 15.50 ms
99.6 percentile: 15.70 ms
99.7 percentile: 16.40 ms
99.8 percentile: 17.80 ms
99.9 percentile: 19.40 ms
actual query rate: 300.12 Q/s
utilization: 1.14 %
zero hit queries: 0
zero hit percentage: 0.00 %
http request status breakdown:
200 : 53990
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 89449
***************** Benchmark Summary *****************
clients: 500
ran for: 180 seconds
cycle time: 1000 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 79949
cycles not held: 0
minimum response time: 5.97 ms
maximum response time: 63.04 ms
average response time: 11.94 ms
25 percentile: 10.60 ms
50 percentile: 12.10 ms
75 percentile: 13.40 ms
90 percentile: 14.30 ms
95 percentile: 14.70 ms
98 percentile: 15.20 ms
99 percentile: 15.70 ms
99.5 percentile: 17.00 ms
99.6 percentile: 18.10 ms
99.7 percentile: 19.00 ms
99.8 percentile: 20.00 ms
99.9 percentile: 21.50 ms
actual query rate: 500.21 Q/s
utilization: 1.19 %
zero hit queries: 0
zero hit percentage: 0.00 %
http request status breakdown:
200 : 89949
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 125241
***************** Benchmark Summary *****************
clients: 700
ran for: 180 seconds
cycle time: 1000 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 111941
cycles not held: 0
minimum response time: 6.26 ms
maximum response time: 73.44 ms
average response time: 12.50 ms
25 percentile: 11.10 ms
50 percentile: 12.60 ms
75 percentile: 13.90 ms
90 percentile: 14.80 ms
95 percentile: 15.40 ms
98 percentile: 17.00 ms
99 percentile: 20.00 ms
99.5 percentile: 22.20 ms
99.6 percentile: 22.70 ms
99.7 percentile: 23.40 ms
99.8 percentile: 24.50 ms
99.9 percentile: 27.10 ms
actual query rate: 700.29 Q/s
utilization: 1.25 %
zero hit queries: 0
zero hit percentage: 0.00 %
http request status breakdown:
200 : 125941
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 161021
***************** Benchmark Summary *****************
clients: 900
ran for: 180 seconds
cycle time: 1000 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 143921
cycles not held: 0
minimum response time: 6.14 ms
maximum response time: 331.34 ms
average response time: 14.21 ms
25 percentile: 11.50 ms
50 percentile: 13.20 ms
75 percentile: 14.80 ms
90 percentile: 18.60 ms
95 percentile: 22.50 ms
98 percentile: 26.90 ms
99 percentile: 30.80 ms
99.5 percentile: 35.90 ms
99.6 percentile: 37.90 ms
99.7 percentile: 43.90 ms
99.8 percentile: 100.52 ms
99.9 percentile: 186.46 ms
actual query rate: 900.38 Q/s
utilization: 1.42 %
zero hit queries: 0
zero hit percentage: 0.00 %
http request status breakdown:
200 : 161921
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 214707
***************** Benchmark Summary *****************
clients: 1200
ran for: 180 seconds
cycle time: 1000 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 191907
cycles not held: 0
minimum response time: 6.21 ms
maximum response time: 123.93 ms
average response time: 23.21 ms
25 percentile: 12.70 ms
50 percentile: 18.10 ms
75 percentile: 30.10 ms
90 percentile: 43.60 ms
95 percentile: 52.10 ms
98 percentile: 62.40 ms
99 percentile: 70.10 ms
99.5 percentile: 77.30 ms
99.6 percentile: 79.60 ms
99.7 percentile: 82.70 ms
99.8 percentile: 86.40 ms
99.9 percentile: 93.20 ms
actual query rate: 1200.52 Q/s
utilization: 2.32 %
zero hit queries: 0
zero hit percentage: 0.00 %
http request status breakdown:
200 : 215907
Starting clients...
[dummydate]: PROGRESS: vespa-fbench: Seconds left 180
[dummydate]: PROGRESS: vespa-fbench: Seconds left 120
[dummydate]: PROGRESS: vespa-fbench: Seconds left 60
Stopping clients
Clients stopped.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 250505
***************** Benchmark Summary *****************
clients: 1400
ran for: 180 seconds
cycle time: 1000 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 223905
cycles not held: 62
minimum response time: 6.39 ms
maximum response time: 1246.07 ms
average response time: 210.46 ms
25 percentile: 21.30 ms
50 percentile: 94.00 ms
75 percentile: 351.40 ms
90 percentile: 618.90 ms
95 percentile: 730.00 ms
98 percentile: 806.40 ms
99 percentile: 841.90 ms
99.5 percentile: 866.30 ms
99.6 percentile: 873.68 ms
99.7 percentile: 883.33 ms
99.8 percentile: 902.02 ms
99.9 percentile: 932.71 ms
actual query rate: 1400.57 Q/s
utilization: 21.05 %
zero hit queries: 0
zero hit percentage: 0.00 %
http request status breakdown:
200 : 244665
429 : 7240