一、问题背景
某客户将云ES从5.6.4版本升级到6.8.2版本后出现数据写入异常,数据丢失的情况。需协助紧急协助处理
客户业务写入方式为filebeat---->logstash-------->es
二、原因分析
查看logstash日志有很多如下异常报错信息
Could not index event to Elasticsearch. {:status=>400, :action=>"index", {:_id=>nil, :_index=>"logstash-f1-hq-access-2022.09.02.12", :routing=>nil, :_type=>"doc"}, #<LogStash::Event:0x38205eef>, :response=>{"index"=>{"index"=>"logstash-f1-hq-access-2022.09.02.12", "_type"=>"doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Failed to parse mapping [_default]: include_in_all is not allowed for indices created on or after version 6.0.0 as _all is deprecated. As a replacement, you can use an copy_to on mapping fields to create your own catch all field.", "caused_by"=>{"type"=>"mapper_parsing_exception", "reason"=>"include_in_all is not allowed for indices created on or after version 6.0.0 as _all is deprecated. As a replacement, you can use an copy_to on mapping fields to create your own catch all field."}}}}}
报错表示索引 mapping 参数include_in_all,在6.0版本之后创建的索引中无法使用(5.x 版本创建包含此设置的索引在升级 6.x 版本后可以兼容)详情参考The include_in_all mapping parameter is now disallowed
用户是通过logstash索引模板创建索引写入数据到ES,所以先从模板入手确认问题所在
客户logstash索引模板如下
代码语言:json复制{
"order": 0,
"version": 50001,
"index_patterns": [
"logstash-*"
],
"settings": {
"index": {
"refresh_interval": "5s"
}
},
"mappings": {
"_default_": {
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
],
"properties": {
"@timestamp": {
"type": "date",
"include_in_all": true
},
"@version": {
"type": "keyword",
"include_in_all": true
},
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
}
}
}
},
"aliases": {}
}
三、解决方案
1、发现用户模板中的@timestamp和@version有用到include_in_all,经跟用户会议沟通,我们将"include_in_all": true修改为"include_in_all": false后写入ES正常了。
调整后的模板:
代码语言:json复制{
"order": 0,
"version": 50001,
"index_patterns": [
"logstash-*"
],
"settings": {
"index": {
"refresh_interval": "5s"
}
},
"mappings": {
"_default_": {
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "text",
"norms": false,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
],
"properties": {
"@timestamp": {
"type": "date",
"include_in_all": false
},
"@version": {
"type": "keyword",
"include_in_all": false
},
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
}
}
}
},
"aliases": {}
}
2、客户疑问:新的数据已经写入成功了,丢失的这部分数据如何找回如何补救?
客户的filebeat是有持续的发送数据的,filebeat有数据持久化不会丢失数据。重新发一份就行。
Filebeat如何确保文件内容不丢失(至少发送一次)
registry记录每个harvester最后读取到文件的offset,只要数据被发送成功时,才会记录。如果发送失败,则会一直重复发送
如果filebeat正在运行时,需要关闭。filebeat不会等待所有接收方确认完,而是立刻关闭。等再次启动时,这部分未确认的内容会重新发送(至少发送一次)。
因此做出如下方案补缺失的数据
- 复制一份新的filebeat ,配置文件注明要补的文件,上报到一个新的索引名,比如叫A1
- 通过reindex 命令将 A1 从19:01:03(举例)到21:20:04(举例)的数据reindex 到目标索引A2 命令参考如下: POST _reindex { "source": { "index": "A1", "query": { "range": { "@timestamp": { "from": "2022-09-02 19:00:00.001", "to": "2022-09-02 20:40:00.001", "include_lower": true, "include_upper": true, "time_zone": " 08:00", "format": "yyyy-MM-dd HH:mm:ss.SSS" } } } }, "dest": { "index": "A2" } }
四、升级注意事项
跨版本升级一定要处理好兼容事项,升级检查遇到的warn和error一定要处理谨慎
ES 版本升级检查