--------日志----------
CheckSum异常,不允许分片上线
日志报错:
代码语言:javascript复制org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=1wawfr3 actual=ux64oi (resource=name [_dem.fdt], length [2731835920], checksum [1wawfr3], writtenBy [8.7.0]) (resource=VerifyingIndexOutput(_dem.fdt))
解析:
一般是因为磁盘或系统问题导致的分片文件损坏,es checksum异常
解决:
- 参考官网进行保守修复:https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-tool.html。若系统一直为reblance,需要把cluster.routing.allocation.allow_rebalance的值,改成 indices_primaries_active
2. 停机修复:https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-tool.html
3. 不停机修复:用lucene里面提供的工具试了下,主要参考如下文章:
代码语言:javascript复制https://mincong.io/cn/elasticsearch-corrupted-index/
然后按照下面的步骤处理了下,感觉有点绕远,理论上直接remove掉分配过期的分片就行
1. 把有问题的分片数据copy出来
2. 指定日志里面报错的segment检测下,有问题的segment名字是 _dem,然后确实有报错
/data/c_log/repository/jdk/kona11.0.9.1.b1/bin/java -cp lib/lucene-core-8.7.0.jar:lib/ohc-core-0.7.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /data1/containers/data_bak/index -segment _dem
3. 然后修复
/data/c_log/repository/jdk/kona11.0.9.1.b1/bin/java -cp lib/lucene-core-8.7.0.jar:lib/ohc-core-0.7.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /data1/containers/data_bak/index -exorcise
4. close索引,替换分片的索引文件目录
5. reopen
---------Explain API-----------
磁盘满
日志报错:
代码语言:javascript复制the node is above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=95%], using more disk space than the maximum allowed [95.0%], actual free: [4.05%]
解决方法:
扩容磁盘或者删除数据
分配文档数超过最大值限制
日志报错:
代码语言:javascript复制failure IllegalArgumentException[number of documents in the index cannot exceed 2147483519
解决方法:
向新索引中写入数据,并合理设置分片大小