记录:Elasticsearch长文本keyword异常

2021-07-23 11:00:22 浏览数 (1)

背景content.keyword字段使用keyword类型,在导数据的时候出现异常。

代码语言:javascript复制
"content":{
       "type":"text",
       "fields":{
           "ansj":{
                "analyzer":"index_ansj_analyzer",
                "type":"text"
            },
            "trigram":{
                 "analyzer":"trigram_analyzer",
                 "type":"text"
            },
            "keyword":{
                  "type":"keyword"
            }
        }
}

异常信息:由于content是一个长文本,content.keyword字段类型为keyword,一个term不能容纳这么长的字符。

代码语言:javascript复制
failure in bulk execution: [86]: index [retopic-21.07.06-001232], type [_doc], id [55015922], message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="content.keyword" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[-24, -65, -98, -27, -92, -89, -27, -92, -85, 44, -27, -66, -120, -24, -66, -101, -24, -117, -90, 44, -23, -103, -92, -28, -70, -122, -28, -72, -118, -25]...', original message: bytes can be at most 32766 in length; got 33424]]; nested: ElasticsearchException[Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 33424]];]

修改:直接去掉contentkeyword多字段。

0 人点赞