说明
安装 elasticsearch 的 ik 和 pinyin 分词插件,插件的版本要和 elasticsearch 的版本一致
ik 分词地址: https://github.com/medcl/elasticsearch-analysis-ik/
pinyin分词地址: https://github.com/medcl/elasticsearch-analysis-pinyin/
本文使用 elasticsearch 5.6.9 安装
开始
拉取镜像
代码语言:javascript复制docker pull elasticsearch:5.6.9
下载插件包
代码语言:javascript复制mkdir docker # 先建个文件夹
# 下载 ik 插件
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.9/elasticsearch-analysis-ik-5.6.9.zip
# 解压
unzip elasticsearch-analysis-ik-5.6.9.zip -d analysis-ik
# 下载 pinyin 插件
wget https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v5.6.9/elasticsearch-analysis-pinyin-5.6.9.zip
#解压
unzip elasticsearch-analysis-pinyin-5.6.9.zip -d analysis-pinyin
..
创建 Dockerfile
代码语言:javascript复制FROM elasticsearch:5.6.9
ADD analysis-ik /usr/share/elasticsearch/plugins/analysis-ik
ADD analysis-pinyin /usr/share/elasticsearch/plugins/analysis-pinyin
.
代码语言:javascript复制docker build -f Dockerfile -t elasticsearch-ik-pinyin:5.6.9 .
成功创建显示:
代码语言:javascript复制root@Alone88-Uos:~/docker/els6# docker build -f Dockerfile -t elasticsearch-ik-pinyin:5.6.9 .
Sending build context to Docker daemon 18.01MB
Step 1/3 : FROM elasticsearch:5.6.9
---> 5c1e1ecfe33a
Step 2/3 : ADD analysis-ik /usr/share/elasticsearch/plugins/analysis-ik
---> 883cd55df8a8
Step 3/3 : ADD analysis-pinyin /usr/share/elasticsearch/plugins/analysis-pinyin
---> 8c9220f304be
Successfully built 8c9220f304be
Successfully tagged elasticsearch-ik-pinyin:5.6.9
创建容器
代码语言:javascript复制docker run -e ES_JAVA_OPTS="-Xms256m -Xmx256m" -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" --name elasticsearch_test elasticsearch-ik-pinyin:5.6.9
-e ES_JAVA_OPTS="-Xms256m -Xmx256m" 是设置 elasticsearch 启动的内存大小,默认是系统一半内存
-e discovery.type 是设置为单节点
elasticsearch-ik-pinyin:5.6.9 就是构建镜像的镜像名和版本号
测试分词
测试拼音
请求 https://127.0.0.1:9200/_analyze 请求方式为 post 请求主体
代码语言:javascript复制{ "text": "中华人民共和国国徽", "analyzer": "pinyin" }
返回
代码语言:javascript复制**{
"tokens":**[
**{
"token":"zhong",
"start_offset":0,
"end_offset":1,
"type":"word",
"position":0
},
**{
"token":"zhrmghggh",
"start_offset":0,
"end_offset":9,
"type":"word",
"position":0
},
**{
"token":"hua",
"start_offset":1,
"end_offset":2,
"type":"word",
"position":1
},
**{
"token":"ren",
"start_offset":2,
"end_offset":3,
"type":"word",
"position":2
},
**{
"token":"min",
"start_offset":3,
"end_offset":4,
"type":"word",
"position":3
},
**{
"token":"gong",
"start_offset":4,
"end_offset":5,
"type":"word",
"position":4
},
**{
"token":"he",
"start_offset":5,
"end_offset":6,
"type":"word",
"position":5
},
**{
"token":"guo",
"start_offset":6,
"end_offset":7,
"type":"word",
"position":6
},
**{
"token":"guo",
"start_offset":7,
"end_offset":8,
"type":"word",
"position":7
},
**{
"token":"hui",
"start_offset":8,
"end_offset":9,
"type":"word",
"position":8
}
]
}
测试 ik 分词
analyzer:可填项有:chinese|ik_max_word|ik_smart,其中chinese是ES的默认分词器选项,ik_max_word(最细粒度划分)和ik_smart(最少划分)是ik中文分词器选项
请求地址: https://127.0.0.1:9200/_analyze 请求方式 : post 请求主体:
代码语言:javascript复制**{
"text":"中华人民共和国国徽",
"analyzer":"ik_max_word"
}
返回
代码语言:javascript复制**{
"tokens":**[
**{
"token":"中华人民共和国",
"start_offset":0,
"end_offset":7,
"type":"CN_WORD",
"position":0
},
**{
"token":"中华人民",
"start_offset":0,
"end_offset":4,
"type":"CN_WORD",
"position":1
},
**{
"token":"中华",
"start_offset":0,
"end_offset":2,
"type":"CN_WORD",
"position":2
},
**{
"token":"华人",
"start_offset":1,
"end_offset":3,
"type":"CN_WORD",
"position":3
},
**{
"token":"人民共和国",
"start_offset":2,
"end_offset":7,
"type":"CN_WORD",
"position":4
},
**{
"token":"人民",
"start_offset":2,
"end_offset":4,
"type":"CN_WORD",
"position":5
},
**{
"token":"共和国",
"start_offset":4,
"end_offset":7,
"type":"CN_WORD",
"position":6
},
**{
"token":"共和",
"start_offset":4,
"end_offset":6,
"type":"CN_WORD",
"position":7
},
**{
"token":"国",
"start_offset":6,
"end_offset":7,
"type":"CN_CHAR",
"position":8
},
**{
"token":"国徽",
"start_offset":7,
"end_offset":9,
"type":"CN_WORD",
"position":9
}
]
}
注:不管是拼音分词器还是IK分词器,当深入搜索一条数据是时,必须是通过分词器分析的数据,才能被搜索到,否则搜索不到
IK分词和拼音分词的组合使用
代码语言:javascript复制PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"ik_smart_pinyin": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": ["my_pinyin", "word_delimiter"]
},
"ik_max_word_pinyin": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": ["my_pinyin", "word_delimiter"]
}
},
"filter": {
"my_pinyin": {
"type" : "pinyin",
"keep_separate_first_letter" : true,
"keep_full_pinyin" : true,
"keep_original" : true,
"limit_first_letter_length" : 16,
"lowercase" : true,
"remove_duplicated_term" : true
}
}
}
}
}
当我们建type时,需要在字段的analyzer属性填写自己的映射
代码语言:javascript复制PUT /my_index/my_type/_mapping
{
"my_type":{
"properties": {
"id":{
"type": "integer"
},
"name":{
"type": "text",
"analyzer": "ik_smart_pinyin"
}
}
}
}