本文描述问题及解决方法同样适用于 腾讯云 Elasticsearch Service(ES)。
另外使用到:腾讯云 云服务器(Cloud Virtual Machine,CVM)
本文延续上一篇 Elasticsearch压测工具esrally部署之踩坑实录(二)
本文另有延续:
Elasticsearch 7.10.1集群3节点4核16G压测报告(Intel)
Elasticsearch 7.10.1压测对比(4核16G*3,AMD vs Intel)
环境配置
注:这套环境配置为本文验证通过的环境配置及版本,避免踩坑请尽量按照环境配置里提到的配置及版本
Esrally客户端环境
- 版本
Linux环境:Centos 7.9
Python:3.8.7
Pip:pip 20.2.3 from pip (python 3.8)
Java:openjdk version 1.8.0_302 (build 1.8.0_302-b08)
Git:2.7.5
Esrally:2.3.0
- 配置
内存:32G
硬盘:SSD云硬盘 100GB
CPU个数:1
CPU核心数:16
Elasticsearch服务端环境
- 版本
Linux环境:Centos 7.2
Java:openjdk version 11.0.9.1-ga (build 11.0.9.1-ga 1, mixed mode)
Elasticsearch版本:7.10.1(腾讯云 Elasticsearch Service 白金版)
- 配置
节点数量:3
内存:16G
硬盘:SSD云硬盘 1TB
CPU个数:1
CPU核心数:4
CPU型号:AMD EPYC 7K62 48-Core Processor
背景
在大数据时代的今天,业务量越来越大,每天动辄都会产生上百GB、上TB的数据,所以拥有一个性能强劲的Elasticsearch集群就显得尤为重要。我们需要模拟大量网络日志、用户行为日志的读写动作,衡量各性能的指标,找出集群瓶颈所在,以确认我们需要怎样的硬件配置以及业务优化,才能满足现有的业务量,这就是我们在业务上线前所必要做的。
压测
esrally 相关术语及参数
Rally 是汽车拉力赛的意思,所以关于它里面术语也是跟汽车的拉力赛有关。
- track: 即赛道的意思,这里指压测用到的样本数据和压测策略,使用
esrally list tracks
列出。rally 自带的 track 可在 https://github.com/elastic/rally-tracks 中查看,每个 track 的文件名中都存在 README.md 对压测的数据类型和参数做了详细的说明。如果没有指定 track, 则默认使用 geonames track 进行测试; - target-hosts:即远程elasticsearch的ip和端口,以ip:port的形式指定;
- pipeline: 指一个压测流程,可以通过
esrally list pipeline
查看,其中有一个benchmark-only
的流程,就是将 es 的管理交给用户来操作,rally 只用来做压测,如果你想针对已有的 es 进行压测,则使用该模式; - track-params:对默认的压测参数进行覆盖;
- user-tag:本次压测的 tag 标记;
- client-options:指定一些客户端连接选项,比如用户名和密码。
压测指令
代码语言:javascript复制esrally race
--track=geonames
--target-hosts=10.0.10.4:9200
--pipeline=benchmark-only
--track-params="number_of_shards:3, number_of_replicas:1"
--user-tag="version:AMD_4C16G_1T*3"
--client-options="basic_auth_user:'elastic', basic_auth_password:'your_password'"
压测报告
压测指标 | 压测任务 | 压测结果 | 单位 |
---|---|---|---|
Cumulative indexing time of primary shards | 16.6515 | min | |
Min cumulative indexing time across primary shards | 0 | min | |
Median cumulative indexing time across primary shards | 0.001258 | min | |
Max cumulative indexing time across primary shards | 5.89373 | min | |
Cumulative indexing throttle time of primary shards | 0 | min | |
Min cumulative indexing throttle time across primary shards | 0 | min | |
Median cumulative indexing throttle time across primary shards | 0 | min | |
Max cumulative indexing throttle time across primary shards | 0 | min | |
Cumulative merge time of primary shards | 5.12393 | min | |
Cumulative merge count of primary shards | 113 | ||
Min cumulative merge time across primary shards | 0 | min | |
Median cumulative merge time across primary shards | 0.001775 | min | |
Max cumulative merge time across primary shards | 1.8119 | min | |
Cumulative merge throttle time of primary shards | 0.954067 | min | |
Min cumulative merge throttle time across primary shards | 0 | min | |
Median cumulative merge throttle time across primary shards | 0 | min | |
Max cumulative merge throttle time across primary shards | 0.367133 | min | |
Cumulative refresh time of primary shards | 1.98815 | min | |
Cumulative refresh count of primary shards | 1037 | ||
Min cumulative refresh time across primary shards | 0 | min | |
Median cumulative refresh time across primary shards | 0.007558 | min | |
99th percentile service time | phrase | 4.84451 | ms |
99.9th percentile service time | phrase | 22.3893 | ms |
100th percentile service time | phrase | 38.3952 | ms |
error rate | phrase | 0 | % |
Min Throughput | country_agg_uncached | 2.99 | ops/s |
Mean Throughput | country_agg_uncached | 2.99 | ops/s |
Median Throughput | country_agg_uncached | 2.99 | ops/s |
Max Throughput | country_agg_uncached | 2.99 | ops/s |
50th percentile latency | country_agg_uncached | 263.52 | ms |
90th percentile latency | country_agg_uncached | 274.372 | ms |
99th percentile latency | country_agg_uncached | 298.735 | ms |
100th percentile latency | country_agg_uncached | 307.146 | ms |
50th percentile service time | country_agg_uncached | 262.61 | ms |
90th percentile service time | country_agg_uncached | 273.913 | ms |
99th percentile service time | country_agg_uncached | 297.431 | ms |
100th percentile service time | country_agg_uncached | 306.319 | ms |
error rate | country_agg_uncached | 0 | % |
Min Throughput | country_agg_cached | 97.24 | ops/s |
Mean Throughput | country_agg_cached | 97.97 | ops/s |
Median Throughput | country_agg_cached | 98.03 | ops/s |
Max Throughput | country_agg_cached | 98.48 | ops/s |
50th percentile latency | country_agg_cached | 2.28737 | ms |
90th percentile latency | country_agg_cached | 3.5575 | ms |
99th percentile latency | country_agg_cached | 3.89999 | ms |
99.9th percentile latency | country_agg_cached | 17.5393 | ms |
100th percentile latency | country_agg_cached | 23.545 | ms |
50th percentile service time | country_agg_cached | 1.53483 | ms |
90th percentile service time | country_agg_cached | 1.83408 | ms |
99th percentile service time | country_agg_cached | 2.41103 | ms |
99.9th percentile service time | country_agg_cached | 6.71132 | ms |
100th percentile service time | country_agg_cached | 23.0925 | ms |
error rate | country_agg_cached | 0 | % |
Min Throughput | scroll | 20.03 | pages/s |
Mean Throughput | scroll | 20.03 | pages/s |
Median Throughput | scroll | 20.03 | pages/s |
Max Throughput | scroll | 20.04 | pages/s |
50th percentile latency | scroll | 603.447 | ms |
90th percentile latency | scroll | 617.022 | ms |
99th percentile latency | scroll | 619.746 | ms |
100th percentile latency | scroll | 629.479 | ms |
50th percentile service time | scroll | 601.664 | ms |
90th percentile service time | scroll | 615.662 | ms |
99th percentile service time | scroll | 618.066 | ms |
100th percentile service time | scroll | 627.276 | ms |
error rate | scroll | 0 | % |
Min Throughput | expression | 1.5 | ops/s |
Mean Throughput | expression | 1.5 | ops/s |
Median Throughput | expression | 1.5 | ops/s |
Max Throughput | expression | 1.5 | ops/s |
50th percentile latency | expression | 480.244 | ms |
90th percentile latency | expression | 497.217 | ms |
99th percentile latency | expression | 535.208 | ms |
100th percentile latency | expression | 677.822 | ms |
50th percentile service time | expression | 478.77 | ms |
90th percentile service time | expression | 496.211 | ms |
99th percentile service time | expression | 534.702 | ms |
100th percentile service time | expression | 677.301 | ms |
error rate | expression | 0 | % |
Min Throughput | painless_static | 1.4 | ops/s |
Mean Throughput | painless_static | 1.4 | ops/s |
Median Throughput | painless_static | 1.4 | ops/s |
Max Throughput | painless_static | 1.4 | ops/s |
50th percentile latency | painless_static | 610.906 | ms |
90th percentile latency | painless_static | 650.136 | ms |
99th percentile latency | painless_static | 731.019 | ms |
100th percentile latency | painless_static | 770.706 | ms |
50th percentile service time | painless_static | 610.065 | ms |
90th percentile service time | painless_static | 648.115 | ms |
99th percentile service time | painless_static | 730.345 | ms |
100th percentile service time | painless_static | 769.865 | ms |
error rate | painless_static | 0 | % |
Min Throughput | painless_dynamic | 1.4 | ops/s |
Mean Throughput | painless_dynamic | 1.4 | ops/s |
Median Throughput | painless_dynamic | 1.4 | ops/s |
Max Throughput | painless_dynamic | 1.4 | ops/s |
50th percentile latency | painless_dynamic | 608.727 | ms |
90th percentile latency | painless_dynamic | 649.741 | ms |
99th percentile latency | painless_dynamic | 695.298 | ms |
100th percentile latency | painless_dynamic | 702.601 | ms |
50th percentile service time | painless_dynamic | 608.169 | ms |
90th percentile service time | painless_dynamic | 649.34 | ms |
99th percentile service time | painless_dynamic | 694.455 | ms |
100th percentile service time | painless_dynamic | 701.651 | ms |
error rate | painless_dynamic | 0 | % |
Min Throughput | decay_geo_gauss_function_score | 1 | ops/s |
Mean Throughput | decay_geo_gauss_function_score | 1 | ops/s |
Median Throughput | decay_geo_gauss_function_score | 1 | ops/s |
Max Throughput | decay_geo_gauss_function_score | 1 | ops/s |
50th percentile latency | decay_geo_gauss_function_score | 560.088 | ms |
90th percentile latency | decay_geo_gauss_function_score | 616.046 | ms |
99th percentile latency | decay_geo_gauss_function_score | 644.189 | ms |
100th percentile latency | decay_geo_gauss_function_score | 652.326 | ms |
50th percentile service time | decay_geo_gauss_function_score | 558.796 | ms |
90th percentile service time | decay_geo_gauss_function_score | 614.672 | ms |
99th percentile service time | decay_geo_gauss_function_score | 643.052 | ms |
100th percentile service time | decay_geo_gauss_function_score | 650.823 | ms |
error rate | decay_geo_gauss_function_score | 0 | % |
Min Throughput | decay_geo_gauss_script_score | 1 | ops/s |
Mean Throughput | decay_geo_gauss_script_score | 1 | ops/s |
Median Throughput | decay_geo_gauss_script_score | 1 | ops/s |
Max Throughput | decay_geo_gauss_script_score | 1 | ops/s |
50th percentile latency | decay_geo_gauss_script_score | 575.714 | ms |
90th percentile latency | decay_geo_gauss_script_score | 602.96 | ms |
99th percentile latency | decay_geo_gauss_script_score | 629.875 | ms |
100th percentile latency | decay_geo_gauss_script_score | 643.619 | ms |
50th percentile service time | decay_geo_gauss_script_score | 574.411 | ms |
90th percentile service time | decay_geo_gauss_script_score | 602.263 | ms |
99th percentile service time | decay_geo_gauss_script_score | 628.526 | ms |
100th percentile service time | decay_geo_gauss_script_score | 641.251 | ms |
error rate | decay_geo_gauss_script_score | 0 | % |
Min Throughput | field_value_function_score | 1.5 | ops/s |
Mean Throughput | field_value_function_score | 1.5 | ops/s |
Median Throughput | field_value_function_score | 1.5 | ops/s |
Max Throughput | field_value_function_score | 1.5 | ops/s |
50th percentile latency | field_value_function_score | 231.966 | ms |
90th percentile latency | field_value_function_score | 268.346 | ms |
99th percentile latency | field_value_function_score | 332.754 | ms |
100th percentile latency | field_value_function_score | 334.69 | ms |
50th percentile service time | field_value_function_score | 230.874 | ms |
90th percentile service time | field_value_function_score | 267.318 | ms |
99th percentile service time | field_value_function_score | 332.027 | ms |
100th percentile service time | field_value_function_score | 333.704 | ms |
error rate | field_value_function_score | 0 | % |
Min Throughput | field_value_script_score | 1.5 | ops/s |
Mean Throughput | field_value_script_score | 1.5 | ops/s |
Max Throughput | desc_sort_with_after_geonameid | 6.01 | ops/s |
50th percentile latency | desc_sort_with_after_geonameid | 125.9 | ms |
90th percentile latency | desc_sort_with_after_geonameid | 151.684 | ms |
99th percentile latency | desc_sort_with_after_geonameid | 185.673 | ms |
100th percentile latency | desc_sort_with_after_geonameid | 200.655 | ms |
50th percentile service time | desc_sort_with_after_geonameid | 124.833 | ms |
90th percentile service time | desc_sort_with_after_geonameid | 148.707 | ms |
99th percentile service time | desc_sort_with_after_geonameid | 185.15 | ms |
100th percentile service time | desc_sort_with_after_geonameid | 200.042 | ms |
error rate | desc_sort_with_after_geonameid | 0 | % |
Min Throughput | asc_sort_geonameid | 6.02 | ops/s |
Mean Throughput | asc_sort_geonameid | 6.02 | ops/s |
Median Throughput | asc_sort_geonameid | 6.02 | ops/s |
Max Throughput | asc_sort_geonameid | 6.03 | ops/s |
50th percentile latency | asc_sort_geonameid | 5.46044 | ms |
90th percentile latency | asc_sort_geonameid | 6.02821 | ms |
99th percentile latency | asc_sort_geonameid | 7.26891 | ms |
100th percentile latency | asc_sort_geonameid | 7.97036 | ms |
50th percentile service time | asc_sort_geonameid | 4.58443 | ms |
90th percentile service time | asc_sort_geonameid | 5.08835 | ms |
99th percentile service time | asc_sort_geonameid | 6.91502 | ms |
100th percentile service time | asc_sort_geonameid | 7.10789 | ms |
error rate | asc_sort_geonameid | 0 | % |
Min Throughput | asc_sort_with_after_geonameid | 6.01 | ops/s |
Mean Throughput | asc_sort_with_after_geonameid | 6.01 | ops/s |
Median Throughput | asc_sort_with_after_geonameid | 6.01 | ops/s |
Max Throughput | asc_sort_with_after_geonameid | 6.01 | ops/s |
50th percentile latency | asc_sort_with_after_geonameid | 112.296 | ms |
90th percentile latency | asc_sort_with_after_geonameid | 132.813 | ms |
99th percentile latency | asc_sort_with_after_geonameid | 156.594 | ms |
100th percentile latency | asc_sort_with_after_geonameid | 176.157 | ms |
50th percentile service time | asc_sort_with_after_geonameid | 111.349 | ms |
90th percentile service time | asc_sort_with_after_geonameid | 132.107 | ms |
99th percentile service time | asc_sort_with_after_geonameid | 155.66 | ms |
100th percentile service time | asc_sort_with_after_geonameid | 175.446 | ms |
error rate | asc_sort_with_after_geonameid | 0 | % |