第一种方法:
1)、安装ik分词器
注意:不能用默认elasticsearch-plugin install xxx.zip 进行自动安装 https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.4.2 对应es版本安装
代码语言:javascript复制docker exec -it elasticsearch /bin/bash
进入es容器内部 默认在/usr/share/elasticsearch目录下
- 下载解压压缩包
- 分词器放入plugins目录中
- 在bin目录中校验是否安装成功
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip
unzip 下载的文件
rm –rf *.zip
mv elasticsearch/ /usr/share/elasticsearch/plugins/ik
可以确认是否安装好了分词器
代码语言:javascript复制cd /usr/share/elasticsearch/bin
elasticsearch-plugin list
即可列出系统的分词器
然后重启elasticsearch
代码语言:javascript复制docker restart elasticsearch
如果wget的时候慢 可以下载下来复制到容器中 然后再解压
代码语言:javascript复制docker cp xxx.txt docker容器名或id:/xxx/xxx/xxxx
本地文件绝对路径 docker容器中文件路径
2)、测试分词器
索引(新增)一个文档
代码语言:javascript复制PUT bank/external/1
{
"name": "John Doe"
}
使用默认
代码语言:javascript复制GET bank/_analyze
{
"text": "我是中国人"
}
请观察结果
代码语言:javascript复制{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "中",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "国",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position": 3
},
{
"token": "人",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position": 4
}
]
}
使用分词器
代码语言:javascript复制GET bank/_analyze
{ "analyzer": "ik_smart",
"text": "我是中国人"
}
请观察结果
代码语言:javascript复制{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
}
]
}
另外一个分词器 ik_max_word
代码语言:javascript复制GET bank/_analyze
{ "analyzer": "ik_max_word",
"text": "我是中国人"
}
请观察结果
代码语言:javascript复制{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
},
{
"token": "中国",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 3
},
{
"token": "国人",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 4
}
]
}
能够看出不同的分词器,分词有明显的区别,所以以后定义一个type不能再使用默认的mapping了,要手工建立mapping, 因为要选择分词器。
第二种方法:
https://github.com/pipizhang/docker-elasticsearch-analysis-ik