ElasticSearch 6.x 学习笔记:19.搜索高亮

2022-05-06 19:15:38 浏览数 (1)

19.1 高亮概述

参照官方文档 https://www.elastic.co/guide/en/elasticsearch/reference/6.1/search-request-highlighting.html

Highlighters enable you to get highlighted snippets from one or more fields in your search results so you can show users where the query matches are. When you request highlights, the response contains an additional highlight element for each search hit that includes the highlighted fields and the highlighted fragments. 高亮使您能够从搜索结果中的一个或多个字段中获取突出显示的片段,以便向用户显示查询所匹配的位置。 当我们请求高亮显示时,响应体包含每个搜索匹配的附加突出显示元素,包括突出显示的字段和突出显示的片段。

Highlighting requires the actual content of a field. If the field is not stored (the mapping does not set store to true), the actual _source is loaded and the relevant field is extracted from _source. 高亮显示需要一个字段的实际内容。 如果该字段没有被存储(映射mapping没有将存储设置为 true),则加载实际的_source,并从_source中提取相关的字段。

NOTE:The _all field cannot be extracted from _source, so it can only be used for highlighting if it is explicitly stored. 注:_all字段不能从_source中提取,因此只能用于高亮显示是否明确存储。

19.2 默认高亮

【例子】使用默认高亮显示来获取每个搜索命中title字段的高亮显示,在指定title字段的查询请求中包含高亮显示对象。

代码语言:javascript复制
GET website/_search
{
    "query" : {
        "match": { "title": "yum" }
    },
    "highlight" : {
        "fields" : {
            "title" : {}
        }
    }
}

查询结果

代码语言:javascript复制
{
  "took": 27,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.9227539,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "6",
        "_score": 0.9227539,
        "_source": {
          "title": "CentOS更换国内yum源",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS更换国内yum源",
          "url": "http://url/53946911"
        },
        "highlight": {
          "title": [
            "CentOS更换国内<em>yum</em>源"
          ]
        }
      }
    ]
  }
}

19.3 自定义高亮标签

pre_tags Use in conjunction with post_tags to define the HTML tags to use for the highlighted text. By default, highlighted text is wrapped in <em> and</em> tags. Specify as an array of strings. post_tags Use in conjunction with pre_tags to define the HTML tags to use for the highlighted text. By default, highlighted text is wrapped in <em>and </em> tags. Specify as an array of strings.

自定义高亮可以通过pre_tags和post_tags设置

代码语言:javascript复制
GET website/_search
{
    "query" : {
        "match": { "title": "yum" }
    },
    "highlight" : {
        "fields" : {
            "title" : {
              "pre_tags":["<mark>"],
              "post_tags":["</mark>"]
            }
        }
    }
}
代码语言:javascript复制
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.9227539,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "6",
        "_score": 0.9227539,
        "_source": {
          "title": "CentOS更换国内yum源",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS更换国内yum源",
          "url": "http://url/53946911"
        },
        "highlight": {
          "title": [
            "CentOS更换国内<mark>yum</mark>源"
          ]
        }
      }
    ]
  }
}

19.4 多字段高亮

我们希望搜索title字段时,除了title字段中匹配关键字高亮,摘要abstract字段对应的关键字也要高亮,这需要对require_field_match属性进行设置。

By default, only fields that contains a query match are highlighted. Set require_field_match to false to highlight all fields. Defaults to true. 默认情况下,只有包含查询匹配的字段才会突出显示。 因为默认require_field_match值为true,可以设置为false以突出显示所有字段。

【例子】title和abstract字段高亮

代码语言:javascript复制
GET website/_search
{
    "query" : {
        "match": { "title": "yum" }
    },
    "highlight" : {
        "require_field_match":false,
        "fields" : {
            "title" : {},
            "abstract" : {}
        }
    }
}
代码语言:javascript复制
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.9227539,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "6",
        "_score": 0.9227539,
        "_source": {
          "title": "CentOS更换国内yum源",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS更换国内yum源",
          "url": "http://url/53946911"
        },
        "highlight": {
          "abstract": [
            "CentOS更换国内<em>yum</em>源"
          ],
          "title": [
            "CentOS更换国内<em>yum</em>源"
          ]
        }
      }
    ]
  }
}

19.5 高亮性能分析

Elasticsearch supports three highlighters: unified, plain, and fvh (fast vector highlighter). You can specify the highlighter type you want to use for each field. Elasticsearch支持三个高亮器:unified,plain和fvh(快速向量高亮器)。 您可以指定要为每个字段使用的高亮器类型。

(1)Unified高亮器 The unified highlighter uses the Lucene Unified Highlighter. This highlighter breaks the text into sentences and uses the BM25 algorithm to score individual sentences as if they were documents in the corpus. It also supports accurate phrase and multi-term (fuzzy, prefix, regex) highlighting. This is the default highlighter. unified高亮器使用Lucene统一高亮器。 这个高亮器将文本分解为句子,并使用BM25算法对单个句子进行评分,就好像它们是文集中的文档一样。 它还支持准确的短语和多项(模糊,前缀,正则表达式)突出显示。 这是默认的高亮器

(2)Plain高亮器 The plain highlighter uses the standard Lucene highlighter. It attempts to reflect the query matching logic in terms of understanding word importance and any word positioning criteria in phrase queries. plain高亮器使用标准的Lucene高亮器。 它试图在短语查询中理解单词重要性和任何单词定位标准来反映查询匹配逻辑。

(3)fvh高亮器 The fvh highlighter uses the Lucene Fast Vector highlighter. This highlighter can be used on fields with term_vector set to with_positions_offsets in the mapping. The fast vector highlighter: fvh高亮器使用Lucene Fast Vector高亮器。此高亮器可用于在映射中将term_vector设置为with_positions_offsets的字段。 快速向量高亮器:

  • Can be customized with a boundary_scanner. (可以使用boundary_scanner进行自定义。)
  • Requires setting term_vector to with_positions_offsets which increases the size of the index (需要将term_vector设置为with_positions_offsets,这会增加索引的大小)
  • Can combine matches from multiple fields into one result. See matched_fields (可以将来自多个字段的匹配组合成一个结果。 请参阅matched_fields)
  • Can assign different weights to matches at different positions allowing for things like phrase matches being sorted above term matches when highlighting a Boosting Query that boosts phrase matches over term matches (可以为不同的位置上的匹配分配不同的权重,从而允许在突出显示提高词条匹配的term匹配)

【例子】设置高亮类型 The type field allows to force a specific highlighter type. The allowed values are: unified, plain and fvh. The following is an example that forces the use of the plain highlighter type字段允许强制设定的高亮器类型。允许的值是:unified, plain和fvh。 下面是一个强制使用plain高亮器的例子

代码语言:javascript复制
GET website/_search
{
    "query" : {
        "match": { "title": "yum" }
    },
    "highlight" : {
        "fields" : {
            "comment" : {"type" : "plain"}
        }
    }
}
代码语言:javascript复制
{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.9227539,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "6",
        "_score": 0.9227539,
        "_source": {
          "title": "CentOS更换国内yum源",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS更换国内yum源",
          "url": "http://url/53946911"
        }
      }
    ]
  }
}

【例子】 Here is an example of setting the comment field to allow for highlighting using the term_vectors (this will cause the index to be bigger): 这是一个设置comment字段使用fvh高亮器的例子,通过term_vectors进行设置,这会导致索引变大。

代码语言:javascript复制
PUT /example
{
  "mappings": {
    "doc" : {
      "properties": {
        "comment" : {
          "type": "text",
          "term_vector" : "with_positions_offsets"
        }
      }
    }
  }
}

0 人点赞