1、安装IK分词器,下载对应版本的插件,elasticsearch-analysis-ik中文分词器的开发者一直进行维护的,对应着elasticsearch的版本,所以选择好自己的版本即可。IKAnalyzer中文分词器原作者已经不进行维护了,但是Lucece在不断更新,所以使用Lucece和IKAnalyzer中文分词器集成,需要你进行修改IKAnalyzer中文分词器。
下载地址:https://github.com/medcl/elasticsearch-analysis-ik/releases
将下载好的中文分词器上传到你的服务器,或者使用wget命令联网下载,萝卜白菜各有所爱吧。我的IK中文分词器版本对应了ElasticSearch的版本。下载好开始安装,由于我是伪分布集群,就一台集群,就安装一次即可,但是如果是分布式的话,每一台机器都要安装中文分词器的。
2、开始解压缩操作,将elasticsearch-analysis-ik-5.4.3.zip拷贝到一个目录里面进行解压缩操作,安装IK中文分词器。
代码语言:javascript复制 1 [root@slaver4 package]# mkdir elasticsearch-analysis-ik
2 [root@slaver4 package]# cp elasticsearch-analysis-ik-5.4.3.zip elasticsearch-analysis-ik
3 [root@slaver4 elasticsearch-analysis-ik]# unzip elasticsearch-analysis-ik-5.4.3.zip
4 Archive: elasticsearch-analysis-ik-5.4.3.zip
5 inflating: elasticsearch-analysis-ik-5.4.3.jar
6 inflating: httpclient-4.5.2.jar
7 inflating: httpcore-4.4.4.jar
8 inflating: commons-logging-1.2.jar
9 inflating: commons-codec-1.9.jar
10 creating: config/
11 creating: config/custom/
12 inflating: config/surname.dic
13 inflating: config/preposition.dic
14 inflating: config/custom/mydict.dic
15 inflating: config/custom/single_word_full.dic
16 inflating: config/custom/sougou.dic
17 inflating: config/custom/ext_stopword.dic
18 inflating: config/custom/single_word.dic
19 inflating: config/custom/single_word_low_freq.dic
20 inflating: config/main.dic
21 inflating: config/IKAnalyzer.cfg.xml
22 inflating: config/quantifier.dic
23 inflating: config/stopword.dic
24 inflating: config/suffix.dic
25 inflating: plugin-descriptor.properties
26 [root@slaver4 elasticsearch-analysis-ik]# ls
27 commons-codec-1.9.jar commons-logging-1.2.jar config elasticsearch-analysis-ik-5.4.3.jar elasticsearch-analysis-ik-5.4.3.zip httpclient-4.5.2.jar httpcore-4.4.4.jar plugin-descriptor.properties
28 [root@slaver4 elasticsearch-analysis-ik]#
由于unzip的默认解压缩到当前目录,这里可以将elasticsearch-analysis-ik-5.4.3.zip包删除掉。
代码语言:javascript复制1 [root@slaver4 elasticsearch-analysis-ik]# ls
2 commons-codec-1.9.jar commons-logging-1.2.jar config elasticsearch-analysis-ik-5.4.3.jar elasticsearch-analysis-ik-5.4.3.zip httpclient-4.5.2.jar httpcore-4.4.4.jar plugin-descriptor.properties
3 [root@slaver4 elasticsearch-analysis-ik]# rm -rf elasticsearch-analysis-ik-5.4.3.zip
4 [root@slaver4 elasticsearch-analysis-ik]# ls
5 commons-codec-1.9.jar commons-logging-1.2.jar config elasticsearch-analysis-ik-5.4.3.jar httpclient-4.5.2.jar httpcore-4.4.4.jar plugin-descriptor.properties
6 [root@slaver4 elasticsearch-analysis-ik]#
然后将解压缩好的IK移动到ElasticSearch的plugins目录下面。记得三个节点plugins目录里面都需要放的哦。如下所示:
注意:放到plugins目录里面的IK中文分词器插件,必须放到一个目录里面,再放到plugins目录里面。如我的elasticsearch-analysis-ik里面存的就是IK中文分词器解压缩后的文件。
代码语言:javascript复制1 [root@slaver4 package]# mv elasticsearch-analysis-ik/ /home/hadoop/soft/elasticsearch-5.4.3/plugins
2 [root@slaver4 package]# cd /home/hadoop/soft/elasticsearch-5.4.3/plugins
3 [root@slaver4 plugins]# ls
4 elasticsearch-analysis-ik
5 [root@slaver4 plugins]#
由于Elasticsearch没有提供关闭的命令,使用kill -9 进程号,在生产环境也是大忌的,生产环境如果使用kill命令的话,建议使用kill 进程号,让系统把任务处理完,再关闭进程。如果你的es是启动的,使用如下命令进行重启,如果es没有启动,直接启动即可。
代码语言:javascript复制1 -- 如果是集群式的,每个节点在每个机器上面,执行如下命令就可以停止es
2 [root@slaver4 elasticsearch-analysis-ik]# kill `ps -ef | grep Elasticsearch | grep -v grep | awk '{print $2}'`
我的es伪集群未启动,这里直接启动即可,伪集群,你启动最好同时启动,不然会出现一些报错,因为他们要感应到集群中的节点,但是没有很大影响的。
在启动head插件的时候,报这种错误,总感觉心里不爽,Local Npm module "grunt-contrib-jasmine" not found. Is it installed?。
代码语言:javascript复制 1 [elsearch@slaver4 soft]$ cd elasticsearch-head-master/
2 [elsearch@slaver4 elasticsearch-head-master]$ ls
3 crx Dockerfile-alpine Gruntfile.js index.html node_modules package-lock.json proxy _site test
4 Dockerfile elasticsearch-head.sublime-project grunt_fileSets.js LICENCE package.json plugin-descriptor.properties README.textile src
5 [elsearch@slaver4 elasticsearch-head-master]$ cd node_modules/grunt
6 [elsearch@slaver4 grunt]$ ls
7 bin CHANGELOG lib LICENSE node_modules package.json README.md
8 [elsearch@slaver4 grunt]$ cd bin/
9 [elsearch@slaver4 bin]$ ls
10 grunt
11 [elsearch@slaver4 bin]$ ./grunt -server &
12 [1] 8602
13 [elsearch@slaver4 bin]$ >> Local Npm module "grunt-contrib-jasmine" not found. Is it installed?
14 Warning: Task "jasmine" not found. Use --force to continue.
15
16 Aborted due to warnings.
17
18 [1] Exit 3 ./grunt -server
19 [elsearch@slaver4 bin]$ jps
20 8388 Elasticsearch
21 8457 Elasticsearch
22 8527 Elasticsearch
23 8623 Jps
24 [elsearch@slaver4 bin]$ ls
25 grunt
26 [elsearch@slaver4 bin]$ ./grunt server &
27 [1] 8633
28 [elsearch@slaver4 bin]$ >> Local Npm module "grunt-contrib-jasmine" not found. Is it installed?
29
30 Running "connect:server" (connect) task
31 Waiting forever...
32 Started connect web server on http://192.168.110.133:9100
如何解决上面的错误呢,如下所示,安装错误的缺失包即可。切记,进入到head的目录,执行命令即可npm install grunt-contrib-jasmine。
如果缺少下面的包,执行如是命令即可。
[elsearch@slaver4 elasticsearch-head-master]$ npm install grunt-contrib-clean grunt-contrib-concat grunt-contrib-watch grunt-contrib-connect grunt-contrib-copy grunt-contrib-jasmine
或者,使用淘宝的源,下载比较快些。
[elsearch@slaver4 elasticsearch-head-master]$ npm install grunt-contrib-clean grunt-contrib-concat grunt-contrib-watch grunt-contrib-connect grunt-contrib-copy grunt-contrib-jasmine --registry=https://registry.npm.taobao.org
代码语言:javascript复制 1 [elsearch@slaver4 elasticsearch-head-master]$ ls
2 crx Dockerfile-alpine Gruntfile.js index.html node_modules package-lock.json proxy _site test
3 Dockerfile elasticsearch-head.sublime-project grunt_fileSets.js LICENCE package.json plugin-descriptor.properties README.textile src
4 [elsearch@slaver4 elasticsearch-head-master]$ npm install grunt-contrib-jasmine
5 npm WARN elasticsearch-head@0.0.0 license should be a valid SPDX license expression
6 npm WARN optional SKIPPING OPTIONAL DEPENDENCY: fsevents@1.2.9 (node_modules/fsevents):
7 npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for fsevents@1.2.9: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})
8
9 grunt-contrib-jasmine@1.0.3
10 added 9 packages from 13 contributors in 30.811s
11 [elsearch@slaver4 elasticsearch-head-master]$
搞了一下,把head插件搞的起不来了,也是郁闷。启动错误如下所示。
代码语言:javascript复制1 [elsearch@slaver4 bin]$ >> No "clean" targets found.
2 Warning: Task "clean" failed. Use --force to continue.
3
4 Aborted due to warnings.
5
6 [1] Exit 3 ./grunt -server
起不来了,估计是依赖包,出现了问题,我这里直接就将所有的依赖包都更新到最新了。
代码语言:javascript复制 1 [elsearch@slaver4 bin]$ npm install grunt@latest
2 stall grunt-contrib-connect@latest
3 npm ERR! code ENOSELF
4 npm ERR! Refusing to install package with name "grunt" under a package
5 npm ERR! also called "grunt". Did you name your project the same
6 npm ERR! as the dependency you're installing?
7 npm ERR!
8 npm ERR! For more information, see:
9 npm ERR! <https://docs.npmjs.com/cli/install#limitations-of-npms-install-algorithm>
10
11 npm ERR! A complete log of this run can be found in:
12 npm ERR! /home/elsearch/.npm/_logs/2019-10-20T09_08_08_039Z-debug.log
13 [elsearch@slaver4 bin]$ npm install grunt-cli@latest
14 npm notice created a lockfile as package-lock.json. You should commit this file.
15 grunt-cli@1.3.2
16 added 148 packages from 121 contributors, updated 2 packages and audited 984 packages in 180.566s
17 found 0 vulnerabilities
18
19 [elsearch@slaver4 bin]$ npm install grunt-contrib-copy@latest
20 grunt-contrib-copy@1.0.0
21 added 9 packages from 3 contributors and audited 994 packages in 36.523s
22 found 0 vulnerabilities
23
24 [elsearch@slaver4 bin]$ npm install grunt-contrib-concat@latest
25 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
26
27 grunt-contrib-concat@1.0.1
28 added 1 package from 1 contributor and audited 1004 packages in 5.282s
29 found 0 vulnerabilities
30
31 [elsearch@slaver4 bin]$ npm install grunt-contrib-uglify@latest
32 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
33
34 grunt-contrib-uglify@4.0.1
35 added 18 packages from 44 contributors and audited 1032 packages in 84.271s
36 found 0 vulnerabilities
37
38 [elsearch@slaver4 bin]$ npm install grunt-contrib-clean@latest
39 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
40 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
41
42 grunt-contrib-clean@2.0.0
43 added 15 packages from 8 contributors and audited 1050 packages in 16.732s
44 found 0 vulnerabilities
45
46 [elsearch@slaver4 bin]$ npm install grunt-contrib-watch@latest
47 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
48 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
49
50 grunt-contrib-watch@1.1.0
51 added 24 packages from 28 contributors in 46.459s
52 [elsearch@slaver4 bin]$ npm install grunt-contrib-connect@latest
53 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
54 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
55 npm WARN grunt-contrib-connect@2.1.0 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
56
57 grunt-contrib-connect@2.1.0
58 added 69 packages from 69 contributors and audited 1226 packages in 77.331s
59 found 0 vulnerabilities
60
61 [elsearch@slaver4 bin]$ npm install grunt-contrib-jasmine@latest
62
63 > puppeteer@1.20.0 install /home/hadoop/soft/elasticsearch-head-master/node_modules/grunt/node_modules/puppeteer
64 > node install.js
65
66 Downloading Chromium r686378 - 114 Mb [====================] 100% 0.0s
67 Chromium downloaded to /home/hadoop/soft/elasticsearch-head-master/node_modules/grunt/node_modules/puppeteer/.local-chromium/linux-686378
68 npm WARN grunt-contrib-clean@2.0.0 requires a peer of grunt@>=0.4.5 but none is installed. You must install peer dependencies yourself.
69 npm WARN grunt-contrib-concat@1.0.1 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
70 npm WARN grunt-contrib-connect@2.1.0 requires a peer of grunt@>=0.4.0 but none is installed. You must install peer dependencies yourself.
71 npm WARN grunt-eslint@22.0.0 requires a peer of grunt@>=1 but none is installed. You must install peer dependencies yourself.
72
73 grunt-contrib-jasmine@2.1.0
74 added 244 packages from 120 contributors and audited 2725 packages in 424.187s
75 found 3 high severity vulnerabilities
76 run `npm audit fix` to fix them, or `npm audit` for details
77 [elsearch@slaver4 bin]$ ls
78 grunt
79 [elsearch@slaver4 bin]$ ./grunt server &
80 [1] 8297
81 [elsearch@slaver4 bin]$ Running "connect:server" (connect) task
82 Waiting forever...
83 Started connect web server on http://192.168.110.133:9100
84
85 [elsearch@slaver4 bin]$
3、回归正题,由于安装好了IK中文分词器,现在需要测试一下,是否安装成功了。
首先创建索引名字叫news,创建完毕以后可以在head插件看到你创建的index索引名称news,如下所示:
代码语言:javascript复制1 [elsearch@slaver4 soft]$ ls
2 elasticsearch-5.4.3 elasticsearch-head-master el_slave node-v8.16.2-linux-x64 nohup.out
3 [elsearch@slaver4 soft]$ curl -XPUT http://192.168.110.133:9200/news
4 {"acknowledged":true,"shards_acknowledged":true}[elsearch@slaver4 soft]$
5 [elsearch@slaver4 soft]$
创建好索引以后,创建mapping映射(相当于数据中的schema信息,表名和字段名以及字段的类型),指定那些字段使用什么分词器。切记,三个节点的plugins目录都要放IK中文分词器。
注意:text是分词,存储,建索引。analyzer指定创建索引的时候使用的分词器是IK中文分词器。search_analyzer搜索的时候使用IK中文分词器。
代码语言:javascript复制 1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/_mapping -d'
2 {
3 "properties": {
4 "content": {
5 "type": "text",
6 "analyzer": "ik_max_word",
7 "search_analyzer": "ik_max_word"
8 }
9 }
10
11 }'
12 {"acknowledged":true}[elsearch@slaver4 soft]$
然后开始测试,是否能将中文进行分词。指定中文分词器ik_max_word,-d后面是传参的。
代码语言:javascript复制 1 [elsearch@slaver4 soft]$ curl -XGET 'http://192.168.110.133:9200/_analyze?pretty&analyzer=ik_max_word' -d '我要成为java高级工程师'
2 {
3 "tokens" : [
4 {
5 "token" : "我",
6 "start_offset" : 0,
7 "end_offset" : 1,
8 "type" : "CN_CHAR",
9 "position" : 0
10 },
11 {
12 "token" : "要",
13 "start_offset" : 1,
14 "end_offset" : 2,
15 "type" : "CN_CHAR",
16 "position" : 1
17 },
18 {
19 "token" : "成为",
20 "start_offset" : 2,
21 "end_offset" : 4,
22 "type" : "CN_WORD",
23 "position" : 2
24 },
25 {
26 "token" : "java",
27 "start_offset" : 4,
28 "end_offset" : 8,
29 "type" : "ENGLISH",
30 "position" : 3
31 },
32 {
33 "token" : "高级工程师",
34 "start_offset" : 8,
35 "end_offset" : 13,
36 "type" : "CN_WORD",
37 "position" : 4
38 },
39 {
40 "token" : "高级工",
41 "start_offset" : 8,
42 "end_offset" : 11,
43 "type" : "CN_WORD",
44 "position" : 5
45 },
46 {
47 "token" : "高级",
48 "start_offset" : 8,
49 "end_offset" : 10,
50 "type" : "CN_WORD",
51 "position" : 6
52 },
53 {
54 "token" : "工程师",
55 "start_offset" : 10,
56 "end_offset" : 13,
57 "type" : "CN_WORD",
58 "position" : 7
59 },
60 {
61 "token" : "工程",
62 "start_offset" : 10,
63 "end_offset" : 12,
64 "type" : "CN_WORD",
65 "position" : 8
66 },
67 {
68 "token" : "师",
69 "start_offset" : 12,
70 "end_offset" : 13,
71 "type" : "CN_CHAR",
72 "position" : 9
73 }
74 ]
75 }
76 [elsearch@slaver4 soft]$
ik_smart是更加智能的分词器,推荐使用的,效果如下所示:
代码语言:javascript复制 1 [elsearch@slaver4 soft]$ curl -XGET 'http://192.168.110.133:9200/_analyze?pretty&analyzer=ik_smart' -d '我要成为Java高级工程师'
2 {
3 "tokens" : [
4 {
5 "token" : "我",
6 "start_offset" : 0,
7 "end_offset" : 1,
8 "type" : "CN_CHAR",
9 "position" : 0
10 },
11 {
12 "token" : "要",
13 "start_offset" : 1,
14 "end_offset" : 2,
15 "type" : "CN_CHAR",
16 "position" : 1
17 },
18 {
19 "token" : "成为",
20 "start_offset" : 2,
21 "end_offset" : 4,
22 "type" : "CN_WORD",
23 "position" : 2
24 },
25 {
26 "token" : "java",
27 "start_offset" : 4,
28 "end_offset" : 8,
29 "type" : "ENGLISH",
30 "position" : 3
31 },
32 {
33 "token" : "高级工程师",
34 "start_offset" : 8,
35 "end_offset" : 13,
36 "type" : "CN_WORD",
37 "position" : 4
38 }
39 ]
40 }
41 [elsearch@slaver4 soft]$
至此,IK中文分词器就创建完毕了。
如何向索引Index中类型Type中添加数据,数据插入成功可以在head插件进行浏览,如下所示。
代码语言:javascript复制 1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/1 -d'
2 > {"content":"美国留给伊拉克的是个烂摊子吗"}'
3 {"_index":"news","_type":"fulltext","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$
4 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/2 -d'
5 > {"content":"公安部:各地校车将享最高路权"}'
6 {"_index":"news","_type":"fulltext","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$
7 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/3 -d'
8 > {"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}'
9 {"_index":"news","_type":"fulltext","_id":"3","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$
10 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/4 -d'
11 > {"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}'
12 {"_index":"news","_type":"fulltext","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$ x":"news","_type":"fulltext","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}[elsearch@slaver4 soft]$
数据插入成功以后,可以进行查询高亮显示,如下所示:
代码语言:javascript复制 1 [elsearch@slaver4 soft]$ curl -XPOST http://192.168.110.133:9200/news/fulltext/_search -d'
2 > {
3 > "query" : { "match" : { "content" : "中国" }},
4 > "highlight" : {
5 > "pre_tags" : ["<font color='red'>", "<tag2>"],
6 > "post_tags" : ["</font>", "</tag2>"],
7 > "fields" : {
8 > "content" : {}
9 > }
10 > }
11 > }'
12 {"took":194,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":0.5347766,"hits":[{"_index":"news","_type":"fulltext","_id":"4","_score":0.5347766,"_source":
13 {"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"},"highlight":{"content":["<font color=red>中国</font>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"]}},{"_index":"news","_type":"fulltext","_id":"3","_score":0.27638745,"_source":
14 {"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"},"highlight":{"content":["中韩渔警冲突调查:韩警平均每天扣1艘<font color=red>中国</font>渔船"]}}]}}[elsearch@slaver4 soft]$
15 [elsearch@slaver4 soft]$