- 需求:ES集群运行一段时间后,由于数据量增多,导致集群磁盘占用大,原定的保留期限6个月已经存不下了,需要重新评估数据日增及扩容大小。
1,过滤单个索引(按月存储),求日增大小
ES1:/home/test > curl -s -u 'elastic:123456' 'http://127.0.0.1:9200/_cat/indices/*_my_2024_04*?v&h=i,s,ss&bytes=mb&s=store.size:desc'|head
i s ss
linux_my_2024_04 open 599126
sell_my_2024_04_26 open 87909
ES1:/home/test > curl -s -u 'elastic:123456' 'http://127.0.0.1:9200/_cat/indices/*_my_2024_04*?v&h=i,s,ss,dc&bytes=mb&s=store.size:desc' \
|grep open |awk '{print $3/1024, $4}' \
|awk'{sum+=$1; sum2+=$2} END{printf "总天数=%ld天; 磁盘:Sum=%ld(G),avg=%.1f(G/天); 条数:Sum=%ld(条), avg=%.1f(条/天)\n", NR, sum, sum/NR, sum2, sum2/NR }''
2, 所有索引,分组统计大小
|grep -v close |grep my |grep '2024_07'
test1_my_2024_07 open 35
userlog_my_2024_07_29 open 7
userlog_my_2024_07_28 open 15
userlog_my_2024_07_09 open 8
[root@test ~]
|grep -v close |grep my|grep '2024_07' \
|awk -F'_2024_07| ' '{print $1,$NF}'\
|awk '{arr[$1]+=$2;} END{for(i in arr) print(i " " arr[i]"MB")}'
test1_my 35MB
userlog_my 30MB
[root@test ~]
|grep -v close |grep my |grep '2024_07' \
|awk -F'_2024_07| ' '{print $1,$NF}' \
|awk '{arr[$1] += $2; cntarr[$1]++} END {for(i in arr){print(i,arr[i]"/"cntarr[i]"="arr[i]/cntarr[i]"MB")}}'
test1_my 35/1=35MB
userlog_my 30/3=10MB