概述
继续跟中华石杉老师学习ES,第18篇
课程地址: https://www.roncoo.com/view/55
接上篇博客 白话Elasticsearch17-match_phrase query 短语匹配搜索
官网
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
slop 含义
官网中我们可以看到
代码语言:javascript复制A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2.
slop是什么呢?
query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop 。
- slop的phrase match,就是proximity match,近似匹配
- 如果我们指定了slop,那么就允许搜索关键词进行移动,来尝试与doc进行匹配
- 搜索关键词k,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match
例子
一个query string经过几次移动之后可以匹配到一个document,然后设置slop .
假设有个doc
代码语言:javascript复制hello world, java is very good, spark is also very good.
我们使用 match_phrase query 来搜索 java spark ,是肯定搜索不到的, 因为 match_phrase query 会将java spark 作为一个整体来查找。
如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配
这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了 。
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
slop,设置的是3,那么就ok
代码语言:javascript复制GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java spark",
"slop": 3
}
}
}
}
就可以把刚才那个doc匹配上,那个doc会作为结果返回
但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的。
示例一
我们那我们的测试数据来验证下
代码语言:javascript复制GET /forum/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "spark data",
"slop": 3
}
}
}
}
分析一下slop
data经过了3次移动才匹配到 spark data ,所以 slop设置为3即可,当然了设置成比3大的数字,肯定也是可以查询到的,这里的slop设置为3 ,可以理解为至少移动3次。
示例二
如果我们搜索data spark 呢? 会不会匹配得到呢? 答案是 : 可以
来分析一下
示例三
slop搜索下,关键词离的越近,relevance score就会越高 .
代码语言:javascript复制GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "java blog",
"slop": 5
}
}
}
}
返回结果:
代码语言:javascript复制{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.81487787,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 0.81487787,
"_source": {
"articleID": "KDKE-B-9947-#kL5",
"userID": 1,
"hidden": false,
"postDate": "2017-01-02",
"tag": [
"java"
],
"tag_cnt": 1,
"view_cnt": 50,
"title": "this is java blog",
"content": "i think java is the best programming language",
"sub_title": "learned a lot of course",
"author_first_name": "Smith",
"author_last_name": "Williams",
"new_author_last_name": "Williams",
"new_author_first_name": "Smith"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 0.31424814,
"_source": {
"articleID": "XHDK-A-1293-#fJ3",
"userID": 1,
"hidden": false,
"postDate": "2017-01-01",
"tag": [
"java",
"hadoop"
],
"tag_cnt": 2,
"view_cnt": 30,
"title": "this is java and elasticsearch blog",
"content": "i like to write best elasticsearch article",
"sub_title": "learning more courses",
"author_first_name": "Peter",
"author_last_name": "Smith",
"new_author_last_name": "Smith",
"new_author_first_name": "Peter"
}
},
{
"_index": "forum",
"_type": "article",
"_id": "4",
"_score": 0.31424814,
"_source": {
"articleID": "QQPX-R-3956-#aD8",
"userID": 2,
"hidden": true,
"postDate": "2017-01-02",
"tag": [
"java",
"elasticsearch"
],
"tag_cnt": 2,
"view_cnt": 80,
"title": "this is java, elasticsearch, hadoop blog",
"content": "elasticsearch and hadoop are all very good solution, i am a beginner",
"sub_title": "both of them are good",
"author_first_name": "Robbin",
"author_last_name": "Li",
"new_author_last_name": "Li",
"new_author_first_name": "Robbin"
}
}
]
}
}
可以看到
得分最高的
次之
最后