elasticsearch5.x以后版本引入中文分词器

2022-03-29 14:42:17 浏览数 (1)

由于elasticsearch自带的分析器(analyzer)对中文分析效果无法达到中国地区的需求，幸运的是我们可以使用elasticsearch-analysis-ik来插件来完善elasticsearch对中文的处理能力。

看下分词效果，待分析文本为：中华人民共和国解放军hello

使用elasticsearch自带的standard分析器：

ik分析器分词效果：

好了下面直接给出实现方案：

1、下载ik插件，放在elasticsearch安装目录plugins/ik目录下，可参考https://github.com/medcl/elasticsearch-analysis-ik

2、把plugins/ik对应到docker内elasticsearch路径/usr/share/elasticsearch/plugins

3、重新启动elasticsearch，创建索引：通过http://localhost:9200/索引名字

4、通过http://172.21.48.16:9200/索引名字/_mapping/type名字创建映射

代码语言：javascript复制

{
    "student": {
      "properties": {
        "address": {
          "properties": {
            "city": {
              "type": "text",
               "analyzer": "ik_smart",
               "search_analyzer": "ik_smart",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },
            "prov": {
              "type": "text",
               "analyzer": "ik_smart",
               "search_analyzer": "ik_smart",              
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        },
        "age": {
          "type": "long"
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
}

PS：低版本elasticsearch把上面的配置放在elasticsearch.yml文件中，但5.x以后需要通过上面的方式设置

5、验证分词效果：

参考：https://github.com/medcl/elasticsearch-analysis-ik

ElasticsearchService http github git 开源

0 人点赞