ES集群压测工具Rally离线压测方法以及优化建议

李姓门徒

已于 2024-01-17 21:58:37 修改

阅读量2k

点赞数 20

CC 4.0 BY-SA版权

分类专栏： Es 中间件文章标签： elasticsearch 大数据搜索引擎

于 2024-01-15 12:04:50 首次发布

本文链接：https://blog.csdn.net/weixin_43845924/article/details/135597182

中间件同时被 2 个专栏收录

20 篇文章

订阅专栏

6 篇文章

订阅专栏

本文探讨ES集群官方压测工具rally，介绍离线安装及数据准备方法，阐述集群压测流程，包括工具使用、数据压测与解析。还给出ES优化建议，如内存分配、禁止swap、分片设置，以及容量模型建议，涵盖公有云最佳实践、社区建议和监控指标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

探讨es集群的官方压测工具rally以及使用方法，压测方案来自es官网 https://github.com/elastic/rally-tracks?spm=a2c6h.12873639.article-detail.11.2dce10755um1B4。

本文主要讨论，由于在内网环境下，安装rally工具比较困难，并且需要解决复杂的依赖问题。因此本问题主要探讨容器化rally进行压测。

1. 离线rally工具安装和数据准备

为了完成离线安装rally的目标，需要下载如下内容

1，下载rally镜像，保存为rally2.2.1.tar.bz2
2，下载rally压测用例集，可以打包下载，也可以通过git命令下载

git pull git@github.com:elastic/rally-tracks.git

保存为rally-tracks-7.12

3，下载压测用例对应的数据集，由于数据集合很大，因此可以根据实际情况下载，本文只下载了http_logs的压测数据集，保存在rally/benchmarks/data，按照目录结构保存
在这里插入图片描述

下载压测数据集的方法如下，可以通过如下方法下载其他类型的压测数据集

我们在elastic官网github上可以看到esrally的数据集相关描述 https://github.com/elastic/rally-tracks
通过查看 rally-tracks/download.sh，我们可以看到，所有数据源都是从一台亚马逊主机上下载的。
在这里插入图片描述

通过阅读代码，我们知道：数据集的主路径在http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora

通过查看 rally-tracks/geonames/files.txt文件，我们知道了geonames的数据集名称叫 documents-2.json.bz2 和 documents-2-1k.json.bz2 。
在这里插入图片描述

因此，我们尝试组合上述地址，使用浏览器访问：
http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geonames/documents-2.json.bz2
在这里插入图片描述
我们发现，数据集可以下载了。同理，其他数据集，也可以通过这种方式进行手工下载。

4，配置es.sh压测

WORKSPACE=$(pwd)

echo $WORKSPACE

ESADDRESS=$1
#########判断docker是否存在
DOCKER=$(docker images | grep elastic/rally | awk '{print $3}')
if [ -z "${DOCKER}" ]; then
echo "请先加载docker镜像,参考命令 docker load -i 镜像文件"
exit 0
fi


if [ -z "${ESADDRESS}" ]; then
echo "请输入ES集群地址"
exit 0
fi

TESTTYPE=$2
if [ -z "${TESTTYPE}" ]; then
echo "请输入测试类型,目前支持的类型有geonames http_logs"
exit 0
fi


REPORTFILE=$3
if [ -z "${REPORTFILE}" ]; then
REPORTFILE=result.csv
echo "没有输入测试名称,使用默认测试名称result.csv,地址为${WORKSPACE}/rally/benchmarks/result.csv"
fi
echo "测试结果地址为${WORKSPACE}/rally/benchmarks/${REPORTFILE}"
docker="docker"

EXECCMD="${docker} run --network host  -v ${WORKSPACE}/rally:/rally/.rally elastic/rally:2.2.1  race --pipeline=benchmark-only --target-hosts=${ESADDRESS} --track-path=/rally/.rally/benchmarks/data/${TESTTYPE} --offline --report-format=csv --report-file=/rally/.rally/benchmarks/${REPORTFILE}"
echo $EXECCMD
${docker} run --network host  -v ${WORKSPACE}/rally:/rally/.rally elastic/rally:2.2.1  race --pipeline=benchmark-only --target-hosts=${ESADDRESS} --track-path=/rally/.rally/benchmarks/data/${TESTTYPE} --track-params="bulk_indexing_clients:20" --offline --report-format=csv --report-file=/rally/.rally/benchmarks/${REPORTFILE}


docker rm -f $(docker ps -a | grep elastic/rally | awk '{print $1}')

echo "程序执行完毕"

rally2.2.1.tar.bz2 为docker镜像文件
rally 为docker的映射路径，压测数据存储在./rally/benchmarks/data，压测结果存储在./rally/benchmarks/*.csv

完成4个步骤后，组合成完整的数据结构如下：
在这里插入图片描述

2. 进行集群压测

2.1 压测工具使用方法说明

运行es.sh 需要参数为 es集群地址测试的数据类型，例如：

./es.sh xx.xx.xx.xx:9200,xx.xx.xx.xx:9200,xx.xx.xx.xx:9200 http_logs

xx.xx.xx.xx:9200,xx.xx.xx.xx:9200,xx.xx.xx.xx:9200为es集群地址
http_logs 为测试的数据集

2.2 进行es集群数据压测

1, 不同集群的压测数据，跟底层的硬件设备差异很大，因此应该进行磁盘的io也同步进行压测。

先安装fio，如果已经安装可以忽略

yum install -y fio

磁盘io写性能压测

fio -filename=/data1/fio.txt -ioengine=libaio -direct=1 -iodepth 1 -thread -rw=randwrite -bs=4k -size=100G -numjobs=48 -runtime=300 -group_reporting -name=mytest

磁盘io读性能压测

fio -filename=/data1/fio.txt -ioengine=libaio -direct=1 -iodepth 1 -thread -rw=randread -bs=4k -size=100G -numjobs=48 -runtime=300 -group_reporting -name=mytest

2，选择一台服务器安装docker（yum install -y docker)
3，上传工具包，并解压，导入镜像使用docker load -i rally2.2.1.tar.bz2 导入docker 镜像
4，执行es.sh 命令

./es.sh xx.xx.xx.xx:9200,xx.xx.xx.xx:9200,xx.xx.xx.xx:9200 http_logs

不同类型类型的数据集数据结构和数据特性相差很大，因此需要根据不同类型的业务进行选择和测试比较合适。本次选用http_logs

根据官方的介绍，目前涉及的数据集，可供选择，主要包括：
1、Geonames: for evaluating the performance of structured data.
2、Geopoint: for evaluating the performance of geo queries.
3、Percolator: for evaluating the performance of percolation queries.
4、PMC: for evaluating the performance of full text search.
5、NYC taxis: for evaluating the performance for highly structured data.
6、Nested: for evaluating the performance for nested documents.
7、http_logs: for evaluating the performance of (Web) server logs.
8、noaa: for evaluating the performance of range fields.
在这里插入图片描述

2.3 es压测数据解析

压测完成之后又很多性能指标，一般要关注的数据有：

throughput 每个操作的吞吐量，比如 index、search等
latency 每个操作的响应时长数据
Heap used for x 记录堆栈的使用情况

Metric	中文翻译	Task	Value	Unit	备注
Cumulative indexing time of primary shards	主分片累计索引时间		13.36938333	min	越小越好
Min cumulative indexing time across primary shards	跨分片累计索引最小时间		0	min	越小越好
Median cumulative indexing time across primary shards	跨分片累计索引中位时间			min	越小越好
Max cumulative indexing time across primary shards	跨分片累计索引最大时间			min	越小越好
Cumulative indexing throttle time of primary shards	主分片累计节流索引时间		0	min	越小越好
Min cumulative indexing throttle time across primary shards	跨分片累计节流最小索引时间		0	min	越小越好
Median cumulative indexing throttle time across primary shards	跨分片累计节流中位索引时间		0	min	越小越好
Max cumulative indexing throttle time across primary shards	跨分片累计节流最大索引时间		0	min	越小越好
Cumulative merge time of primary shards	主分片累积合并时间		4.2677	min	越小越好
Cumulative merge count of primary shards	主分片累积合并次数		57		越小越好
Min cumulative merge time across primary shards	跨主分片累积最小合并时间		0	min	越小越好
Median cumulative merge time across primary shards	跨主分片累积中位合并时间		1.348033333	min	越小越好
Max cumulative merge time across primary shards	跨主分片累积最大合并时间		1.464033333	min	越小越好
Cumulative merge throttle time of primary shards	主分片累计节流合并时间		1.065866667	min	越小越好
Min cumulative merge throttle time across primary shards	主分片累计节流最小合并时间		0	min	越小越好
Median cumulative merge throttle time across primary shards	主分片累计节流中位合并时间		0.328816667	min	越小越好
Max cumulative merge throttle time across primary shards	主分片累计节流最大合并时间		0.3759	min	越小越好
Cumulative refresh time of primary shards	主分片累积refresh时间		0.798716667	min	越小越好
Cumulative refresh count of primary shards	主分片累积refresh次数		320		越小越好
Min cumulative refresh time across primary shards	主分片累积最小refresh时间		1.67E-05	min	越小越好
Median cumulative refresh time across primary shards	主分片累积中位refresh时间		0.250933333	min	越小越好
Max cumulative refresh time across primary shards	主分片累积最大refresh时间		0.266416667	min	越小越好
Cumulative flush time of primary shards	主分片累积flush时间		0.584383333	min	越小越好
Cumulative flush count of primary shards	主分片累积flush次数		46		越小越好
Min cumulative flush time across primary shards	主分片累积最小flush时间		0	min	越小越好
Median cumulative flush time across primary shards	主分片累积中位flush时间		0.159566667	min	越小越好
Max cumulative flush time across primary shards	主分片累积最大flush时间		0.1632	min	越小越好
Total Young Gen GC time	Young GC总时间		4.179	s	越小越好
Total Young Gen GC count	Young GC总次数		961		越小越好
Total Old Gen GC time	Old GC总时间		0.221	s	越小越好
Total Old Gen GC count	Old GC总次数		4		越小越好
Store size	存储大小		3.018052787	GB	越小越好
Translog size	Translog大小		4.10E-07	GB	越小越好
Heap used for segments	segments使用的堆内内存		0.461437225	MB	越小越好
Heap used for doc values	doc values使用的堆内内存		0.021503448	MB	越小越好
Heap used for terms	terms使用的堆内内存		0.356811523	MB	越小越好
Heap used for norms	norms使用的堆内内存		0.048034668	MB	越小越好
Heap used for points	points使用的堆内内存		0	MB	越小越好
Heap used for stored fields	stored fields使用的堆内内存		0.035087585	MB	越小越好
Segment count	Segment数量		71		越小越好
error rate		index-append	0	%
Min Throughput	最小吞吐率	index-stats	90.02	ops/s	越大越好
Mean Throughput	平均吞吐率	index-stats	90.03	ops/s	越大越好
Median Throughput	中位吞吐率	index-stats	90.03	ops/s	越大越好
Max Throughput	最大吞吐率	index-stats	90.06	ops/s	越大越好
50th percentile latency	提交请求和收到完整回复之间的时间段（50%的请求该时间以内）	index-stats	2.688714827	ms	越小越好
90th percentile latency		index-stats	3.594806814	ms
99th percentile latency		index-stats	6.877146151	ms
99.9th percentile latency		index-stats	12.57476813	ms
100th percentile latency		index-stats	19.47905542	ms
50th percentile service time		index-stats	1.454657991	ms
90th percentile service time		index-stats	1.97627194	ms
99th percentile service time		index-stats	5.543909213	ms
99.9th percentile service time		index-stats	10.26782569	ms
100th percentile service time		index-stats	18.59820995	ms
error rate		index-stats	0	%
Min Throughput		node-stats	90.02	ops/s
Mean Throughput		node-stats	90.05	ops/s
Median Throughput		node-stats	90.04	ops/s
Max Throughput		node-stats	90.14	ops/s
50th percentile latency		node-stats	2.815647516	ms
90th percentile latency		node-stats	4.044909403	ms
99th percentile latency		node-stats	5.212370545	ms
99.9th percentile latency		node-stats	6.852936187	ms
100th percentile latency		node-stats	6.934299599	ms
50th percentile service time		node-stats	1.92963396	ms
90th percentile service time		node-stats	2.280614187	ms
99th percentile service time		node-stats	4.373069127	ms
99.9th percentile service time		node-stats	5.121724201	ms
100th percentile service time		node-stats	5.12892101	ms
error rate		node-stats	0	%
Min Throughput		default	50.02	ops/s
Mean Throughput		default	50.04	ops/s
Median Throughput		default	50.04	ops/s
Max Throughput		default	50.07	ops/s
50th percentile latency		default	3.442207992	ms
90th percentile latency		default	4.541033355	ms
99th percentile latency		default	5.171663366	ms
99.9th percentile latency		default	9.028199148	ms
100th percentile latency		default	9.637624957	ms
50th percentile service time		default	2.594712481	ms
90th percentile service time		default	3.050701669	ms
99th percentile service time		default	3.448219185	ms
99.9th percentile service time		default	8.483097347	ms
100th percentile service time		default	9.405504912	ms
error rate		default	0	%
Min Throughput		term	100.01	ops/s
Mean Throughput		term	100.02	ops/s
Median Throughput		term	100.02	ops/s
Max Throughput		term	100.04	ops/s
50th percentile latency		term	3.199955565	ms
90th percentile latency		term	4.159100866	ms
99th percentile latency		term	9.006197074	ms
99.9th percentile latency		term	20.99158259	ms
100th percentile latency		term	21.52055805	ms
50th percentile service time		term	2.48551101	ms
90th percentile service time		term	3.239720117	ms
99th percentile service time		term	7.17226712	ms
99.9th percentile service time		term	15.9544915	ms
100th percentile service time		term	19.73530301	ms
error rate		term	0	%
Min Throughput		phrase	109.99	ops/s
Mean Throughput		phrase	110	ops/s
Median Throughput		phrase	110	ops/s
Max Throughput		phrase	110.01	ops/s
50th percentile latency		phrase	3.169040603	ms
90th percentile latency		phrase	3.634604893	ms
99th percentile latency		phrase	4.35058805	ms
99.9th percentile latency		phrase	16.27933249	ms
100th percentile latency		phrase	17.08333869	ms
50th percentile service time		phrase	2.451517503	ms
90th percentile service time		phrase	2.724279161	ms
99th percentile service time		phrase	3.216251438	ms
99.9th percentile service time		phrase	9.749228635	ms
100th percentile service time		phrase	15.46012098	ms
error rate		phrase	0	%
Min Throughput		country_agg_uncached	3	ops/s
Mean Throughput		country_agg_uncached	3	ops/s
Median Throughput		country_agg_uncached	3	ops/s
Max Throughput		country_agg_uncached	3	ops/s
50th percentile latency		country_agg_uncached	265.1378055	ms
90th percentile latency		country_agg_uncached	268.3491967	ms
99th percentile latency		country_agg_uncached	282.9874858	ms
100th percentile latency		country_agg_uncached	299.8582891	ms
50th percentile service time		country_agg_uncached	264.1177385	ms
90th percentile service time		country_agg_uncached	267.2917891	ms
99th percentile service time		country_agg_uncached	282.0132841	ms
100th percentile service time		country_agg_uncached	298.699945	ms
error rate		country_agg_uncached	0	%
Min Throughput		country_agg_cached	97.64	ops/s
Mean Throughput		country_agg_cached	98.26	ops/s
Median Throughput		country_agg_cached	98.32	ops/s
Max Throughput		country_agg_cached	98.7	ops/s
50th percentile latency		country_agg_cached	2.175618487	ms
90th percentile latency		country_agg_cached	3.358712979	ms
99th percentile latency		country_agg_cached	3.663528312	ms
99.9th percentile latency		country_agg_cached	4.533531366	ms
100th percentile latency		country_agg_cached	9.735687054	ms
50th percentile service time		country_agg_cached	1.210322545	ms
90th percentile service time		country_agg_cached	1.381615282	ms
99th percentile service time		country_agg_cached	1.652208896	ms
99.9th percentile service time		country_agg_cached	3.39570541	ms
100th percentile service time		country_agg_cached	9.514000965	ms
error rate		country_agg_cached	0	%
Min Throughput		scroll	20.05	pages/s
Mean Throughput		scroll	20.06	pages/s
Median Throughput		scroll	20.06	pages/s
Max Throughput		scroll	20.08	pages/s
50th percentile latency		scroll	273.2520165	ms
90th percentile latency		scroll	301.6026772	ms
99th percentile latency		scroll	347.1331405	ms
100th percentile latency		scroll	349.3009	ms
50th percentile service time		scroll	271.233834	ms
90th percentile service time		scroll	298.9778046	ms
99th percentile service time		scroll	345.1081409	ms
100th percentile service time		scroll	346.241483	ms
error rate		scroll	0	%
Min Throughput		expression	1.5	ops/s
Mean Throughput		expression	1.5	ops/s
Median Throughput		expression	1.5	ops/s
Max Throughput		expression	1.5	ops/s
50th percentile latency		expression	464.535454	ms
90th percentile latency		expression	470.8226439	ms
99th percentile latency		expression	485.6872773	ms
100th percentile latency		expression	487.582457	ms
50th percentile service time		expression	463.644907	ms
90th percentile service time		expression	469.5449809	ms
99th percentile service time		expression	484.4586398	ms
100th percentile service time		expression	486.768786	ms
error rate		expression	0	%
Min Throughput		painless_static	1.4	ops/s
Mean Throughput		painless_static	1.4	ops/s
Median Throughput		painless_static	1.4	ops/s
Max Throughput		painless_static	1.4	ops/s
50th percentile latency		painless_static	581.6272671	ms
90th percentile latency		painless_static	588.2054265	ms
99th percentile latency		painless_static	597.229797	ms
100th percentile latency		painless_static	601.7254018	ms
50th percentile service time		painless_static	580.774506	ms
90th percentile service time		painless_static	587.0630695	ms
99th percentile service time		painless_static	595.7945851	ms
100th percentile service time		painless_static	600.6218339	ms
error rate		painless_static	0	%
Min Throughput		painless_dynamic	1.4	ops/s
Mean Throughput		painless_dynamic	1.4	ops/s
Median Throughput		painless_dynamic	1.4	ops/s
Max Throughput		painless_dynamic	1.4	ops/s
50th percentile latency		painless_dynamic	598.3268638	ms
90th percentile latency		painless_dynamic	604.6501834	ms
99th percentile latency		painless_dynamic	618.8403735	ms
100th percentile latency		painless_dynamic	619.2588332	ms
50th percentile service time		painless_dynamic	597.337956	ms
90th percentile service time		painless_dynamic	603.6431402	ms
99th percentile service time		painless_dynamic	617.5273529	ms
100th percentile service time		painless_dynamic	618.3759769	ms
error rate		painless_dynamic	0	%
Min Throughput		decay_geo_gauss_function_score	1	ops/s
Mean Throughput		decay_geo_gauss_function_score	1	ops/s
Median Throughput		decay_geo_gauss_function_score	1	ops/s
Max Throughput		decay_geo_gauss_function_score	1	ops/s
50th percentile latency		decay_geo_gauss_function_score	558.662883	ms
90th percentile latency		decay_geo_gauss_function_score	566.1635245	ms
99th percentile latency		decay_geo_gauss_function_score	576.7578347	ms
100th percentile latency		decay_geo_gauss_function_score	577.7786931	ms
50th percentile service time		decay_geo_gauss_function_score	557.0170344	ms
90th percentile service time		decay_geo_gauss_function_score	565.1927938	ms
99th percentile service time		decay_geo_gauss_function_score	575.6546767	ms
100th percentile service time		decay_geo_gauss_function_score	576.90977	ms
error rate		decay_geo_gauss_function_score	0	%
Min Throughput		decay_geo_gauss_script_score	1	ops/s
Mean Throughput		decay_geo_gauss_script_score	1	ops/s
Median Throughput		decay_geo_gauss_script_score	1	ops/s
Max Throughput		decay_geo_gauss_script_score	1	ops/s
50th percentile latency		decay_geo_gauss_script_score	575.896866	ms
90th percentile latency		decay_geo_gauss_script_score	584.6959502	ms
99th percentile latency		decay_geo_gauss_script_score	595.1810607	ms
100th percentile latency		decay_geo_gauss_script_score	610.31794	ms
50th percentile service time		decay_geo_gauss_script_score	574.895048	ms
90th percentile service time		decay_geo_gauss_script_score	583.542251	ms
99th percentile service time		decay_geo_gauss_script_score	594.0682872	ms
100th percentile service time		decay_geo_gauss_script_score	608.403309	ms
error rate		decay_geo_gauss_script_score	0	%
Min Throughput		field_value_function_score	1.5	ops/s
Mean Throughput		field_value_function_score	1.5	ops/s
Median Throughput		field_value_function_score	1.5	ops/s
Max Throughput		field_value_function_score	1.5	ops/s
50th percentile latency		field_value_function_score	217.4870086	ms
90th percentile latency		field_value_function_score	221.4966101	ms
99th percentile latency		field_value_function_score	256.0486869	ms
100th percentile latency		field_value_function_score	263.0984769	ms
50th percentile service time		field_value_function_score	216.1670045	ms
90th percentile service time		field_value_function_score	220.499306	ms
99th percentile service time		field_value_function_score	254.5428219	ms
100th percentile service time		field_value_function_score	261.8149639	ms
error rate		field_value_function_score	0	%
Min Throughput		field_value_script_score	1.5	ops/s
Mean Throughput		field_value_script_score	1.5	ops/s
Median Throughput		field_value_script_score	1.5	ops/s
Max Throughput		field_value_script_score	1.5	ops/s
50th percentile latency		field_value_script_score	287.0456218	ms
90th percentile latency		field_value_script_score	290.0809773	ms
99th percentile latency		field_value_script_score	298.1395952	ms
100th percentile latency		field_value_script_score	312.1123726	ms
50th percentile service time		field_value_script_score	285.789164	ms
90th percentile service time		field_value_script_score	288.8581588	ms
99th percentile service time		field_value_script_score	296.5342737	ms
100th percentile service time		field_value_script_score	311.1719809	ms
error rate		field_value_script_score	0	%
Min Throughput		large_terms	1.1	ops/s
Mean Throughput		large_terms	1.1	ops/s
Median Throughput		large_terms	1.1	ops/s
Max Throughput		large_terms	1.1	ops/s
50th percentile latency		large_terms	572.2508298	ms
90th percentile latency		large_terms	580.3001306	ms
99th percentile latency		large_terms	620.8813236	ms
100th percentile latency		large_terms	626.353689	ms
50th percentile service time		large_terms	563.7678955	ms
90th percentile service time		large_terms	572.1782421	ms
99th percentile service time		large_terms	613.3370135	ms
100th percentile service time		large_terms	617.420621	ms
error rate		large_terms	0	%
Min Throughput		large_filtered_terms	1.1	ops/s
Mean Throughput		large_filtered_terms	1.1	ops/s
Median Throughput		large_filtered_terms	1.1	ops/s
Max Throughput		large_filtered_terms	1.1	ops/s
50th percentile latency		large_filtered_terms	589.2866509	ms
90th percentile latency		large_filtered_terms	593.4173963	ms
99th percentile latency		large_filtered_terms	598.5252649	ms
100th percentile latency		large_filtered_terms	602.3230727	ms
50th percentile service time		large_filtered_terms	581.2035115	ms
90th percentile service time		large_filtered_terms	585.5575252	ms
99th percentile service time		large_filtered_terms	590.5933169	ms
100th percentile service time		large_filtered_terms	594.4011461	ms
error rate		large_filtered_terms	0	%
Min Throughput		large_prohibited_terms	1.1	ops/s
Mean Throughput		large_prohibited_terms	1.1	ops/s
Median Throughput		large_prohibited_terms	1.1	ops/s
Max Throughput		large_prohibited_terms	1.1	ops/s
50th percentile latency		large_prohibited_terms	589.4530075	ms
90th percentile latency		large_prohibited_terms	596.0567744	ms
99th percentile latency		large_prohibited_terms	624.6372295	ms
100th percentile latency		large_prohibited_terms	636.1257123	ms
50th percentile service time		large_prohibited_terms	581.6967285	ms
90th percentile service time		large_prohibited_terms	587.9331864	ms
99th percentile service time		large_prohibited_terms	616.5220673	ms
100th percentile service time		large_prohibited_terms	628.309642	ms
error rate		large_prohibited_terms	0	%
Min Throughput		desc_sort_population	1.5	ops/s
Mean Throughput		desc_sort_population	1.51	ops/s
Median Throughput		desc_sort_population	1.51	ops/s
Max Throughput		desc_sort_population	1.51	ops/s
50th percentile latency		desc_sort_population	103.1405666	ms
90th percentile latency		desc_sort_population	105.2754088	ms
99th percentile latency		desc_sort_population	131.8258836	ms
100th percentile latency		desc_sort_population	152.3099904	ms
50th percentile service time		desc_sort_population	101.670836	ms
90th percentile service time		desc_sort_population	104.0073033	ms
99th percentile service time		desc_sort_population	130.6022178	ms
100th percentile service time		desc_sort_population	150.8698669	ms
error rate		desc_sort_population	0	%
Min Throughput		asc_sort_population	1.5	ops/s
Mean Throughput		asc_sort_population	1.51	ops/s
Median Throughput		asc_sort_population	1.51	ops/s
Max Throughput		asc_sort_population	1.51	ops/s
50th percentile latency		asc_sort_population	107.5372407	ms
90th percentile latency		asc_sort_population	110.8386073	ms
99th percentile latency		asc_sort_population	116.6895737	ms
100th percentile latency		asc_sort_population	119.4045231	ms
50th percentile service time		asc_sort_population	106.1783125	ms
90th percentile service time		asc_sort_population	109.3649962	ms
99th percentile service time		asc_sort_population	115.3436784	ms
100th percentile service time		asc_sort_population	118.1872	ms
error rate		asc_sort_population	0	%
Min Throughput		asc_sort_with_after_population	1.5	ops/s
Mean Throughput		asc_sort_with_after_population	1.5	ops/s
Median Throughput		asc_sort_with_after_population	1.5	ops/s
Max Throughput		asc_sort_with_after_population	1.51	ops/s
50th percentile latency		asc_sort_with_after_population	129.1767997	ms
90th percentile latency		asc_sort_with_after_population	133.4439944	ms
99th percentile latency		asc_sort_with_after_population	140.8711791	ms
100th percentile latency		asc_sort_with_after_population	144.9907233	ms
50th percentile service time		asc_sort_with_after_population	127.6733635	ms
90th percentile service time		asc_sort_with_after_population	131.2300396	ms
99th percentile service time		asc_sort_with_after_population	140.3493805	ms
100th percentile service time		asc_sort_with_after_population	143.983128	ms
error rate		asc_sort_with_after_population	0	%
Min Throughput		desc_sort_geonameid	6.01	ops/s
Mean Throughput		desc_sort_geonameid	6.01	ops/s
Median Throughput		desc_sort_geonameid	6.01	ops/s
Max Throughput		desc_sort_geonameid	6.02	ops/s
50th percentile latency		desc_sort_geonameid	6.548634556	ms
90th percentile latency		desc_sort_geonameid	7.124439673	ms
99th percentile latency		desc_sort_geonameid	8.067587848	ms
100th percentile latency		desc_sort_geonameid	8.096768637	ms
50th percentile service time		desc_sort_geonameid	5.541916529	ms
90th percentile service time		desc_sort_geonameid	5.901245272	ms
99th percentile service time		desc_sort_geonameid	6.820803307	ms
100th percentile service time		desc_sort_geonameid	6.879838067	ms
error rate		desc_sort_geonameid	0	%
Min Throughput		desc_sort_with_after_geonameid	5.99	ops/s
Mean Throughput		desc_sort_with_after_geonameid	6	ops/s
Median Throughput		desc_sort_with_after_geonameid	6	ops/s
Max Throughput		desc_sort_with_after_geonameid	6	ops/s
50th percentile latency		desc_sort_with_after_geonameid	142.7790278	ms
90th percentile latency		desc_sort_with_after_geonameid	151.9306856	ms
99th percentile latency		desc_sort_with_after_geonameid	208.632983	ms
100th percentile latency		desc_sort_with_after_geonameid	211.4377066	ms
50th percentile service time		desc_sort_with_after_geonameid	141.9006125	ms
90th percentile service time		desc_sort_with_after_geonameid	149.5498388	ms
99th percentile service time		desc_sort_with_after_geonameid	178.1799831	ms
100th percentile service time		desc_sort_with_after_geonameid	210.2229249	ms
error rate		desc_sort_with_after_geonameid	0	%
Min Throughput		asc_sort_geonameid	6.02	ops/s
Mean Throughput		asc_sort_geonameid	6.02	ops/s
Median Throughput		asc_sort_geonameid	6.02	ops/s
Max Throughput		asc_sort_geonameid	6.02	ops/s
50th percentile latency		asc_sort_geonameid	6.162967999	ms
90th percentile latency		asc_sort_geonameid	6.680636853	ms
99th percentile latency		asc_sort_geonameid	7.167303486	ms
100th percentile latency		asc_sort_geonameid	7.649931009	ms
50th percentile service time		asc_sort_geonameid	5.219853483	ms
90th percentile service time		asc_sort_geonameid	5.514943344	ms
99th percentile service time		asc_sort_geonameid	5.816583	ms
100th percentile service time		asc_sort_geonameid	6.203371915	ms
error rate		asc_sort_geonameid	0	%
Min Throughput		asc_sort_with_after_geonameid	6	ops/s
Mean Throughput		asc_sort_with_after_geonameid	6	ops/s
Median Throughput		asc_sort_with_after_geonameid	6	ops/s
Max Throughput		asc_sort_with_after_geonameid	6.01	ops/s
50th percentile latency		asc_sort_with_after_geonameid	130.5534603	ms
90th percentile latency		asc_sort_with_after_geonameid	131.7300497	ms
99th percentile latency		asc_sort_with_after_geonameid	135.3648191	ms
100th percentile latency		asc_sort_with_after_geonameid	139.0438636	ms
50th percentile service time		asc_sort_with_after_geonameid	129.4173571	ms
90th percentile service time		asc_sort_with_after_geonameid	130.443844	ms
99th percentile service time		asc_sort_with_after_geonameid	133.3877408	ms
100th percentile service time		asc_sort_with_after_geonameid	137.657303	ms
error rate		asc_sort_with_after_geonameid	0	%

3. ES优化建议

3.1 es内存分配

当机器内存小于 64G 时，遵循通用的原则，50% 给 ES，50% 留给 lucene。
当机器内存大于 64G 时，遵循以下原则：
如果主要的使用场景是全文检索，那么建议给 ES Heap 分配 4~32G 的内存即可；其它内存留给操作系统，供 lucene 使用（segments cache），以提供更快的查询性能。
如果主要的使用场景是聚合或排序，并且大多数是 numerics，dates，geo_points 以及 not_analyzed 的字符类型，建议分配给 ES Heap 分配 4~32G 的内存即可，其它内存留给操作系统，供 lucene 使用，提供快速的基于文档的聚类、排序性能。

如果使用场景是聚合或排序，并且都是基于 analyzed 字符数据，这时需要更多的 heap size，建议机器上运行多 ES 实例，每个实例保持不超过 50% 的 ES heap 设置（但不超过 32 G，堆内存设置 32 G 以下时，JVM 使用对象指标压缩技巧节省空间），50% 以上留给 lucene。

3.2 禁止 swap

禁止 swap，一旦允许内存与磁盘的交换，会引起致命的性能问题。可以通过在 elasticsearch.yml 中 bootstrap.memory_lock: true，以保持 JVM 锁定内存，保证 ES 的性能。

3.3 es 分片建议

shard数过小不一定好，如果数据量很大，导致每个 shard 体积过大，会影响查询性能。
shard数过大也不一定好，因为 es 的每次查询是要分发给所有的 shard 来查询，然后再对结果做聚合处理，如果 shard 数过多也会影响查询性能。因此 shard 的数量需要根据自己的情况测出来。官方建议单个 shard 大小不要超过 50GB

官方文档有一节关于容量规划的章节，其给出的步骤如下：
使用生产环境的硬件配置创建单节点集群
创建一个只有一个主分片无副本的索引，设置相关的mapping信息
将真实的文档导入到步骤 2 的索引中
测试实际会用到的查询语句
测试的过程中，关注相关指标数据，比如索引性能、查询性能，如果在某一个点相关性能数据超出了你的预期值，那么此时的 shard size大小便是符合你预期的单个 shard size的大小。

4. ES容量模型建议

1.【公有云 ES 最佳实践】
1.1 集群总分片数 < 30000，单个索引分片大小建议不超过 50g，单节点分片数量 < 4000
1.2 节点数超过 20 节点添加专有 master 节点，即 data:master ≤ 20:1
1.3 cpu/mem/disk 建议比例
搜索场景：比例 1:2:32
日志场景：比例 1:4:192 ～ 1:4:384
1.4 单节点性能规格参考
写入性能： 16c/64g、 jvm 32G 单节点可支持 2w docs/s 写入
存储容量 = 源数据 * (1 + 副本数量) * 1.45 * （1 + 0.5）≈ 源数据 * (1 + 副本数量)

【社区建议】
2.1
数据加速、查询聚合等场景：单节点磁盘最大容量 = 单节点内存大小（GB）* 10。
日志写入、离线分析等场景：单节点磁盘最大容量 = 单节点内存大小（GB）* 50。
通常情况：单节点磁盘最大容量 = 单节点内存大小（GB）* 30。
2.2
单个数据节点的shard数量 = 当前节点的内存大小 * 30（小规格实例参考）
单个数据节点的shard数量 = 当前节点的内存大小 * 50（大规格实例参考）
其他指标【建议监控指标】
cpu < 60%
jvm内存 < 80%
磁盘util < 60%
磁盘使用率 < 70%
集群所有index，必须至少1主 + 1从副本
集群读写拒绝率 < 0.1%
集群无节点 old gc
单节点承载最大数据量 < 1T
ES版本 >= 6.8