故障分析
zabbix 告警 es index 写入异常 ,登录 kibana 查看无新数据写入。所有的 index 都没有生成的新的数据。不同的 index 是分布在不同位置的logstash 主机向 kafka/zookeeper 集群拉取对应的 topic。zabbix 对 es 集群进行监控没有发现告警,例行检查 es 集群状态无异常。怀疑是kafka/zookeeper 集群出现问题,测试消费数据无异常,最后检查 logstash 发现异常日志 定位故障。
Zabbix 事件告警
es 集群 index 数据写入异常
检查 es 集群状态,无异常
代码语言:javascript复制GET _cluster/health?pretty
代码语言:javascript复制{
"cluster_name" : "elk-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 2035,
"active_shards" : 4070,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
kafka/zookeeper
查看 topic
代码语言:javascript复制./bin/kafka-topics.sh --list --zookeeper 192.168.99.232:2181,192.168.99.233:2181,192.168.99.221:2181
消费数据测试
代码语言:javascript复制./bin/kafka-console-consumer.sh --bootstrap-server 192.168.99.233:9092,192.168.99.232:9092,192.168.99.221:9092 --topic networklogs --from-beginning
查看 logstsh 日志报错提示如下
Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"apps_wms-2020.07.16", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x682b3a66>], :response=>{"index"=>{"_index"=>"apps_wms-2020.07.16", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"illegal_argument_exception", "reason"=>"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [4000]/[4000] maximum shards open;"}}}}
es 集群默认有分片限制最大 4000 ,导致 logstash 写入 es 的 index 数据异常
调整 es 集群默认分片限制 index写入恢复正常
代码语言:javascript复制PUT /_cluster/settings{
"transient": {
"cluster": {
"max_shards_per_node":10000
}
}
}