白话Elasticsearch11-深度探秘搜索技术之基于tie_breaker参数优化dis_max搜索效果

2021-08-17 16:53:13 浏览数 (1)

文章目录

  • 概述
  • 官方文档
  • 例子
    • tie_breaker

概述

继续跟中华石杉老师学习ES,第十一篇

课程地址: https://www.roncoo.com/view/55


官方文档

https://www.elastic.co/guide/en/elasticsearch/guide/current/_tuning_best_fields_queries.html

https://www.elastic.co/guide/en/elasticsearch/reference/7.2/query-dsl-dis-max-query.html


例子

数据同 上篇博文 构造索引的DSL

这次我们使用dis_max查询 java beginner , DSL如下

代码语言:javascript复制
GET /forum/article/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "java beginner"
          }
        },
        {
          "match": {
            "content": "java beginner"
          }
        }
      ]
    }
  }
}

返回

代码语言:javascript复制
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.0341108,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "3",
        "_score": 1.0341108,
        "_source": {
          "articleID": "JODL-X-1937-#pV7",
          "userID": 2,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "hadoop"
          ],
          "tag_cnt": 1,
          "view_cnt": 100,
          "title": "this is elasticsearch blog",
          "content": "i am only an elasticsearch beginner"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 0.93952733,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ],
          "tag_cnt": 1,
          "view_cnt": 50,
          "title": "this is java blog",
          "content": "i think java is the best programming language"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 0.79423964,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ],
          "tag_cnt": 2,
          "view_cnt": 80,
          "title": "this is java, elasticsearch, hadoop blog",
          "content": "elasticsearch and hadoop are all very good solution, i am a beginner"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "5",
        "_score": 0.7116974,
        "_source": {
          "articleID": "DHJK-B-1395-#Ky5",
          "userID": 3,
          "hidden": false,
          "postDate": "2019-05-01",
          "tag": [
            "elasticsearch"
          ],
          "tag_cnt": 1,
          "view_cnt": 10,
          "title": "this is spark blog",
          "content": "spark is best big data solution based on scala ,an programming language similar to java"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.4889865,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ],
          "tag_cnt": 2,
          "view_cnt": 30,
          "title": "this is java and elasticsearch blog",
          "content": "i like to write best elasticsearch article"
        }
      }
    ]
  }
}

不知道为啥id=3的相关度是最高的… 如果有知道的,烦请不吝赐教。

dis_max只取某一个query最大的分数,完全不考虑其他query的分数


tie_breaker

使用tie_breaker将其他query的分数也考虑进去

tie_breaker参数的意义,在于说,将其他query的分数,乘以tie_breaker,然后综合与最高分数的那个query的分数,综合在一起进行计算,除了取最高分以外,还会考虑其他的query的分数。

tie_breaker的值,在0~1之间,是个小数。

代码语言:javascript复制
GET /forum/article/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "java beginner"
          }
        },
        {
          "match": {
            "content": "java beginner"
          }
        }
      ],
      "tie_breaker": 0.7
    }
  }
}

返回结果

代码语言:javascript复制
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.344432,
    "hits": [
      {
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 1.344432,
        "_source": {
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ],
          "tag_cnt": 1,
          "view_cnt": 50,
          "title": "this is java blog",
          "content": "i think java is the best programming language"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1.1365302,
        "_source": {
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ],
          "tag_cnt": 2,
          "view_cnt": 80,
          "title": "this is java, elasticsearch, hadoop blog",
          "content": "elasticsearch and hadoop are all very good solution, i am a beginner"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "3",
        "_score": 1.0341108,
        "_source": {
          "articleID": "JODL-X-1937-#pV7",
          "userID": 2,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "hadoop"
          ],
          "tag_cnt": 1,
          "view_cnt": 100,
          "title": "this is elasticsearch blog",
          "content": "i am only an elasticsearch beginner"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "5",
        "_score": 0.7116974,
        "_source": {
          "articleID": "DHJK-B-1395-#Ky5",
          "userID": 3,
          "hidden": false,
          "postDate": "2019-05-01",
          "tag": [
            "elasticsearch"
          ],
          "tag_cnt": 1,
          "view_cnt": 10,
          "title": "this is spark blog",
          "content": "spark is best big data solution based on scala ,an programming language similar to java"
        }
      },
      {
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 0.4889865,
        "_source": {
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ],
          "tag_cnt": 2,
          "view_cnt": 30,
          "title": "this is java and elasticsearch blog",
          "content": "i like to write best elasticsearch article"
        }
      }
    ]
  }
}

0 人点赞