What is Elasticsearch? You know, for search (and analysis) Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack. Elasticsearch provides near real-time search and analytics for all types of data. 从官网介绍可以看出几个关键的字眼,Elasticsearch是分布式的搜索、存储和数据分析引擎。Elasticsearch为所有类型的数据提供近乎实时的搜索和分析。 它很强很好用。后面我会总结一些Elasticsearch的相关文章,本文只体验体验它的搜索功能。
准备工作
安装、启动Elasticsearch,kibana
版本:7.9 下载Elasticsearch和kibana,解压之后进入各自文件夹下的bin目录下,双击.bat文件启动即可,此处略过。 本文主要目的是看elasticsearch的查询功能有多爽,原理性问题后面再说。
验证Elasticsearch启动
Elasticsearch启动成功
Kibana启动成功:
Kibana启动成功
另外,建议安装一个elasticsearch-head
,它能帮助我们很直观的查看ES节点状态。
elasticsearch-head提供可视化的操作页面,对ElasticSearch搜索引擎进行各种设置和数据检索功能。 通过它可以很直观的查看集群的健康状况,索引分配情况,还可以管理索引和集群以及提供方便快捷的搜索功能等等。
安装、启动elasticsearch-head
:
1. 安装node,略
2. 解压glasticsearch-head-master,并进入该文件夹,修改Gruntfile.js,在connect.server.options添加hostname: '*':
connect: {
server: {
options: {
hostname: '*',
port: 9100,
base: '.',
keepalive: true
}
}
}
3. npm安装:D:Javaelasticsearch-head-master> npm install
4. 启动:D:Javaelasticsearch-head-master> npm run start
D:Javaelasticsearch-head-master> npm run start
> elasticsearch-head@0.0.0 start D:Javaelasticsearch-head-master
> grunt server
Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100
ES创建索引
启动Kibana后,直接在Kibana Dev Tools控制台执行。
- 建立索引
# 建立city索引
PUT /city
# 创建的每个索引都可以具有与之关联的特定设置,这些设置在主体中定义
# number_of_shards的默认值为1
# number_of_replicas的默认值为1(即每个主分片一个副本)
PUT /test
{
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
}
我们可以看到,建立的city
索引,有1个分片,1个副本,这也是默认值;而test
索引,通过settings
配置了分片数为3,副本数为2。由图中也可以看出test
有3个分片,每个分片有2个副本。
为什么副本都是unassigned的呢?这是因为ES不允许Primary和它的Replica放在同一个节点中,并且同一个节点不接受完全相同的两个Replica,而我本地只启动了一个ES节点。
- 删除索引
上图中,test和city中间有一个ilm-history-2-000001
,我也不知道它是啥,要不把它删掉吧?
# 删除索引
DELETE /ilm-history-2-000001
# 执行结果
{
"acknowledged" : true
}
可以通过查询索引验证一下:
代码语言:javascript复制# 查询索引
GET /ilm-history-2-000001
{
"error" : {
"root_cause" : [
{
"type" : "index_not_found_exception",
"reason" : "no such index [ilm-history-2-000001]",
"resource.type" : "index_or_alias",
"resource.id" : "ilm-history-2-000001",
"index_uuid" : "_na_",
"index" : "ilm-history-2-000001"
}
],
"type" : "index_not_found_exception",
"reason" : "no such index [ilm-history-2-000001]",
"resource.type" : "index_or_alias",
"resource.id" : "ilm-history-2-000001",
"index_uuid" : "_na_",
"index" : "ilm-history-2-000001"
},
"status" : 404
}
也可以通过elasticsearch-head
查看:
删除索引验证
没有ilm-history-2-000001
索引了。
- 插入数据
这个语法很简单,可参考如下:
代码语言:javascript复制POST /city/_doc/1
{
"city" : "En shi",
"province" : "Hubei province",
"acreage" : 24111
}
POST /city/_doc/2
{
"city" : "E zhou",
"province" : "Hu bei province",
"acreage" : 1594
}
POST /city/_doc/3
{
"city" : "Zheng zhou, China",
"province" : "Henan province, capital",
"acreage" : 7446
}
POST /city/_doc/4
{
"city" : "Bei jing",
"province" : "Beijing China, capital",
"acreage" : 16410
}
POST /city/_doc/5
{
"city" : "Nan jing",
"province" : "Jiangsu province, capital",
"acreage" : 6587
}
POST /city/_doc/6
{
"city" : "Shen zhen",
"province" : "Guangdong province",
"acreage" : 1997
}
POST /city/_doc/7
{
"city" : "Guang zhou",
"province" : "Guangdong province, capital",
"acreage" : 7434
}
以上面的数据为基础,开始ES常用查询。
ES查询
Elasticsearch查询有很多,下面针对常用查询做一个总结。
Query_string
- 查询所有
GET /索引/_search
GET /city/_search
查询所有结果
查询出所有的7条记录,并且relation
类型为eq
(equal),max_score
为1.0(相关度分数)
- 带参数的查询
GET /索引/_search?q=xx:xx
GET /city/_search?q=city: Bei Jing
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.8371272,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "4",
"_score" : 2.8371272,
"_source" : {
"city" : "Bei jing",
"province" : "Beijing China, capital",
"acreage" : 16410
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "5",
"_score" : 1.1631508,
"_source" : {
"city" : "Nan jing",
"province" : "Jiangsu province, capital",
"acreage" : 6587
}
}
]
}
这个查询把city
中带jing
的都查出来了,但相关度分数不一样。
- 分页,排序查询
GET /索引/_search?from=x&size=x&sort=xx:[asc|desc]
GET /city/_search?from=0&size=3&sort=acreage:asc
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "2",
"_score" : null,
"_source" : {
"city" : "E zhou",
"province" : "Hu bei province",
"acreage" : 1594
},
"sort" : [
1594
]
},
{
"_index" : "city",
"_type" : "1",
"_id" : "6",
"_score" : null,
"_source" : {
"city" : "Shen zhen",
"province" : "Guangdong province",
"acreage" : 1997
},
"sort" : [
1997
]
},
{
"_index" : "city",
"_type" : "1",
"_id" : "5",
"_score" : null,
"_source" : {
"city" : "Nan jing",
"province" : "Jiangsu province, capital",
"acreage" : 6587
},
"sort" : [
6587
]
}
]
}
查出来按面积acreage
排序的前3条记录,总记录是7。还可以看出,相关度分数为null
。
Query DSL
match_all
:匹配所有
GET /city/_search
{
"query": {
"match_all": {}
}
}
和GET /city/_search
不带{}
查询结果一致。2. match
:xx字段包含xx
查询city
字段中带zhou
的:
GET /city/_search
{
"query": {
"match": {
"city": "zhou"
}
}
}
# 结果截取
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.8266786,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "2",
"_score" : 0.8266786,
"_source" : {
"city" : "E zhou",
"province" : "Hu bei province",
"acreage" : 1594
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "3",
"_score" : 0.8266786,
"_source" : {
"city" : "Zheng zhou",
"province" : "Henan province, capital",
"acreage" : 7446
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "7",
"_score" : 0.8266786,
"_source" : {
"city" : "Guang zhou",
"province" : "Guangdong province, capital",
"acreage" : 7434
}
}
]
}
查出来3条结果,都有相关度分数。
sort
:排序,正序,倒序
查询province
中包含capital
的并且按照面积倒序排序:
GET /city/_search
{
"query": {
"match": {
"province": "capital"
}
},
"sort": [
{
"acreage": {
"order": "desc"
}
}
]
}
sort查询结果
multi_match
:根据多个字段查询一个关键词
查询city
和province
字段中包含China
的:
GET /city/_search
{
"query": {
"multi_match": {
"query": "China",
"fields": ["city", "province"]
}
}
}
multi_match查询结果
_source
元数据:可以指定显示的字段
设置查询结果只显示acreage
字段:
GET /city/_search
{
"query": {
"multi_match": {
"query": "China",
"fields": ["city", "province"]
}
},
"_source": ["acreage"]
}
_source查询指定的字段
只显示了acreage
字段。
- 分页(deep-paging)
按照acreage
倒序排序,并分页,每页3条记录,查询第一页:
GET /city/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"acreage": {
"order": "desc"
}
}
],
"from": 0,
"size": 3
}
全文检索 Full-text queries
- query-term & query-match
GET /city/_search
{
"query": {
"term": {
"city": {
"value": "Guang zhou"
}
}
}
}
GET /city/_search
{
"query": {
"match": {
"city": "Guang zhou"
}
}
}
query-term查询结果:
query-term查询结果
没有任何记录!!!再来看一下query-match查询结果:
query-match查询结果
有3条记录!!!
- match和term区别
为什么会出现1那样的结果呢?
因为query-term查询的term不会分词,会将Guang zhou
当做一个整体进行操作,而match会进行分词,分成Guang
和zhou
,所以查询结果里面city包含zhou
的都出来了!
- 全文检索
我们来看下面这个查询:
代码语言:javascript复制GET /city/_search
{
"query": {
"match": {
"province": "Guangdong province, capital"
}
}
}
全文检索结果
查出来7条记录,每条记录都有相关度分数,并且按照相关度分数由高到低排好序了!
验证一下分词:
代码语言:javascript复制GET /_analyze
{
"analyzer": "standard",
"text": "Guangdong province, capital"
}
结果:
代码语言:javascript复制{
"tokens" : [
{
"token" : "guangdong",
"start_offset" : 0,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "province",
"start_offset" : 10,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "capital",
"start_offset" : 20,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
由此可见,Guangdong province, capital
被分成了guangdong
,province
,capital
,ES会全文检索这些词,所以查出了所有包含guangdong
,province
,capital
的document。
Phrase search 短语搜索
代码语言:javascript复制GET /city/_search
{
"query": {
"match_phrase": {
"province": "Guangdong province, capital"
}
}
}
和全文检索相反,“GuangdongGuangdong province, capital”会作为一个短语去检索,应该会只查出一条记录:
短语搜索结果
看图,结果确实如此。
Query and filter 查询和过滤
bool
可以组合多个查询条件,bool查询也是采用more_matches_is_better的机制,因此满足must和should子句的
文档
(可理解为数据行)将会合并起来计算分值(相关度)。
- must 必须满足
子句(查询)必须出现在匹配的文档中,并将有助于得分。
- filte 过滤器 不计算相关度分数,cache
子句(查询)必须出现在匹配的文档中。但是不像must,查询的相关度分数将被忽略。 Filter子句在filter上下文中执行,这意味着相关度得分被忽略,并且子句被考虑用于缓存。查询性能很高。
- should 可能满足(SQL中的or)
子句(查询)应出现在匹配的文档中。也可以不在文档中。
- must_not:必须不满足 不计算相关度分数
子句(查询)不得出现在匹配的文档中。子句在过滤器上下文中执行,这意味着相关度得分被忽略,并且子句被视为用于缓存。由于忽略得分,得分将会返回数字0。
- minimum_should_match
案例
在city
这个index增加几条记录:
POST /city/_doc/8
{
"city" : "Fo shan",
"province" : "Guangdong province",
"acreage" : 3797
}
POST /city/_doc/9
{
"city" : "Dong guan",
"province" : "Guangdong province",
"acreage" : 2460
}
POST /city/_doc/10
{
"city" : "Hui zhou",
"province" : "Guangdong province",
"acreage" : 11347
}
POST /city/_doc/11
{
"city" : "Mei zhou",
"province" : "Guangdong province",
"acreage" : 15864
}
- 查询province包含
Guangdong province
,面积大于10000,并且city字段包含hui
的document:
GET /city/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"city": "hui"
}
}
],
"filter": [
{
"match_phrase": {
"province": "Guangdong province"
}
},
{
"range": {
"acreage": {
"gt": 10000
}
}
}
]
}
}
}
这个执行过程是:
- 先对province中包含
Guangdong province
并且面积大于10000的进行筛选,符合条件的有2条记录:
过滤1
- 再对city中包含
hui
的进行过滤,最终结果为:
- bool多条件
查询city包含zhou不包含hui,province里包不包含guangdong都可以,面积要小于6000:
代码语言:javascript复制GET /city/_search
{
"query": {
"bool": {
"must": [
{"match": {
"city": "zhou"
}}
],
"must_not": [
{"match": {
"city": "hui"
}}
],
"should": [
{"match": {
"province": "guangdong"
}}
],
"filter": [
{"range": {
"acreage": {
"lte": 6000
}
}}
]
}
}
}
bool多条件查询结果
- 嵌套查询
minimum_should_match
:参数指定should返回的文档必须匹配的子句的数量或百分比。
如果bool查询包含至少一个should子句,而没有must或 filter子句,则默认值为1。否则,默认值为0。
例如:
代码语言:javascript复制GET /city/_search
{
"query": {
"bool": {
"must": [
{"match": {
"city": "zhou"
}}
],
"should": [
{"range": {
"acreage": {
"gt": 1500
}
}},
{
"range": {
"acreage": {
"gt": 5000
}
}
}
],
"minimum_should_match": 1
}
}
}
minimum_should_match
的意思是在should
的子句中必须至少满足一个条件。
本例查询结果为:
代码语言:javascript复制"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 2.7046783,
"hits" : [
{
"_index" : "city",
"_type" : "1",
"_id" : "7",
"_score" : 2.7046783,
"_source" : {
"city" : "Guang zhou",
"province" : "Guangdong province, capital",
"acreage" : 7434
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "10",
"_score" : 2.7046783,
"_source" : {
"city" : "Hui zhou",
"province" : "Guangdong province",
"acreage" : 11347
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "11",
"_score" : 2.7046783,
"_source" : {
"city" : "Mei zhou",
"province" : "Guangdong province",
"acreage" : 15864
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "3",
"_score" : 2.5874128,
"_source" : {
"city" : "Zheng zhou, China",
"province" : "Henan province, capital",
"acreage" : 7446
}
},
{
"_index" : "city",
"_type" : "1",
"_id" : "2",
"_score" : 1.7046783,
"_source" : {
"city" : "E zhou",
"province" : "Hu bei province",
"acreage" : 1594
}
}
]
}
查询出来的地市满足acreage大于1500或大于5000。
我的其他
- 内核中PageCache和java文件系统IO/NIO以及内存中缓冲区的作用
- 通过Java Socket编程观察内核级TCP的三次握手
- 深入底层探析网络编程之多路复用器(select,poll,epoll)
看完点赞,养成习惯。举手之劳,赞有余香。
- END -