Elasticsearch 5.x 版本升级到 6.x 版本,数据写入异常处理方案

2022-11-30 14:52:07 浏览数 (1)

一、问题背景

某客户将云ES从5.6.4版本升级到6.8.2版本后出现数据写入异常,数据丢失的情况。需协助紧急协助处理

客户业务写入方式为filebeat---->logstash-------->es

二、原因分析

查看logstash日志有很多如下异常报错信息

Could not index event to Elasticsearch. {:status=>400, :action=>"index", {:_id=>nil, :_index=>"logstash-f1-hq-access-2022.09.02.12", :routing=>nil, :_type=>"doc"}, #<LogStash::Event:0x38205eef>, :response=>{"index"=>{"index"=>"logstash-f1-hq-access-2022.09.02.12", "_type"=>"doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Failed to parse mapping [_default]: include_in_all is not allowed for indices created on or after version 6.0.0 as _all is deprecated. As a replacement, you can use an copy_to on mapping fields to create your own catch all field.", "caused_by"=>{"type"=>"mapper_parsing_exception", "reason"=>"include_in_all is not allowed for indices created on or after version 6.0.0 as _all is deprecated. As a replacement, you can use an copy_to on mapping fields to create your own catch all field."}}}}}

报错表示索引 mapping 参数include_in_all,在6.0版本之后创建的索引中无法使用(5.x 版本创建包含此设置的索引在升级 6.x 版本后可以兼容)详情参考The include_in_all mapping parameter is now disallowed

用户是通过logstash索引模板创建索引写入数据到ES,所以先从模板入手确认问题所在

客户logstash索引模板如下

代码语言:json复制
{
  "order": 0,
  "version": 50001,
  "index_patterns": [
    "logstash-*"
  ],
  "settings": {
    "index": {
      "refresh_interval": "5s"
    }
  },
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "message_field": {
            "path_match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false
            }
          }
        },
        {
          "string_fields": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false,
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date",
          "include_in_all": true
        },
        "@version": {
          "type": "keyword",
          "include_in_all": true
        },
        "geoip": {
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip"
            },
            "location": {
              "type": "geo_point"
            },
            "latitude": {
              "type": "half_float"
            },
            "longitude": {
              "type": "half_float"
            }
          }
        }
      }
    }
  },
  "aliases": {}
}

三、解决方案

1、发现用户模板中的@timestamp和@version有用到include_in_all,经跟用户会议沟通,我们将"include_in_all": true修改为"include_in_all": false后写入ES正常了。

调整后的模板:

代码语言:json复制
{
  "order": 0,
  "version": 50001,
  "index_patterns": [
    "logstash-*"
  ],
  "settings": {
    "index": {
      "refresh_interval": "5s"
    }
  },
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "message_field": {
            "path_match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false
            }
          }
        },
        {
          "string_fields": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false,
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date",
          "include_in_all": false
        },
        "@version": {
          "type": "keyword",
          "include_in_all": false
        },
        "geoip": {
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip"
            },
            "location": {
              "type": "geo_point"
            },
            "latitude": {
              "type": "half_float"
            },
            "longitude": {
              "type": "half_float"
            }
          }
        }
      }
    }
  },
  "aliases": {}
}

2、客户疑问:新的数据已经写入成功了,丢失的这部分数据如何找回如何补救?

客户的filebeat是有持续的发送数据的,filebeat有数据持久化不会丢失数据。重新发一份就行。

Filebeat如何确保文件内容不丢失(至少发送一次)

registry记录每个harvester最后读取到文件的offset,只要数据被发送成功时,才会记录。如果发送失败,则会一直重复发送

如果filebeat正在运行时,需要关闭。filebeat不会等待所有接收方确认完,而是立刻关闭。等再次启动时,这部分未确认的内容会重新发送(至少发送一次)。

因此做出如下方案补缺失的数据

  1. 复制一份新的filebeat ,配置文件注明要补的文件,上报到一个新的索引名,比如叫A1
  2. 通过reindex 命令将 A1 从19:01:03(举例)到21:20:04(举例)的数据reindex 到目标索引A2 命令参考如下: POST _reindex { "source": { "index": "A1", "query": { "range": { "@timestamp": { "from": "2022-09-02 19:00:00.001", "to": "2022-09-02 20:40:00.001", "include_lower": true, "include_upper": true, "time_zone": " 08:00", "format": "yyyy-MM-dd HH:mm:ss.SSS" } } } }, "dest": { "index": "A2" } }

四、升级注意事项

跨版本升级一定要处理好兼容事项,升级检查遇到的warn和error一定要处理谨慎

ES 版本升级检查

0 人点赞