Elasticsearch使用:自定义搜索结果的得分

2021-03-23 09:45:01 浏览数 (1)

简介

评分的概念是任何搜索引擎(包括 Elasticsearch)的核心。评分可以粗略地定义为:找到符合一组标准的数据并按相关性顺序将其返回。相关性通常是通过类似 TF-IDF 的算法来实现的,该算法试图找出文本上与提交的查询最相似的文档。尽管 TF-IDF 及其相近的算法(例如BM25)非常棒,但有时必须通过其他算法或通过其他评分启发式方法来解决相关性问题。在这里,Elasticsearch 的script_score 和 function_score 功能变得非常有用。本文将介绍这些工具的用法。

在使用 Elasticsearch 进行全文搜索的时候,默认是使用 BM25 计算的 _score 字段进行降序排序的。当我们需要用其他字段进行降序或者升序排序的时候,可以使用 sort 字段,传入我们想要的排序字段和方式。当简单的使用几个字段升降序排列组合无法满足我们的需求的时候,我们就需要自定义排序的特性,Elasticsearch 提供了 function_score 的 DSL 来自定义打分,这样就可以根据自定义的 _score 来进行排序。

在实际的使用中,我们必须注意的是:soft_score 和 function_score 是耗资源的。您只需要计算一组经过过滤的文档的分数。

自定义得分

准备数据

我们首先来下载我们的测试数据(需解压):

best_games_json_data.zip

然后我们通过Kibana把这个数据来导入到我们的Elasticsearch中:

在导入的过程中,我们选择Time field为year,并且指定相应的日期格式:

我们指定我们的索引名字为best_games:

正常查询

首先我们来看看如果不使用任何的分数定制,那么情况是怎么样的。

代码语言:javascript复制
GET best_games/_search
{
  "_source": [
    "name",
    "critic_score",
    "user_score"
  ],
  "query": {
    "match": {
      "name": "Final Fantasy"
    }
  }
}

在上面的查询中,为了说明问题的方便,在返回的结果中,我们只返回 name, critic_score 和 user_score 字段。我们在 name 字段里含有 “Final Fantasy” 的所有游戏,那么显示的结果是:

代码语言:javascript复制
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : 8.138414,
    "hits" : [
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "l7j0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 9,
          "critic_score" : 92,
          "name" : "Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "pbj0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 92,
          "name" : "Final Fantasy X"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "p7j0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 90,
          "name" : "Final Fantasy VIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "07j0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 92,
          "name" : "Final Fantasy XII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "57j0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 83,
          "name" : "Final Fantasy XIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6Lj0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 94,
          "name" : "Final Fantasy IX"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Hbj0BHcB7EwehiwiuHUR",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Final Fantasy Tactics"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Wbj0BHcB7EwehiwiuHUR",
        "_score" : 8.138414,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 79,
          "name" : "Dissidia: Final Fantasy"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6rj0BHcB7EwehiwiuHMQ",
        "_score" : 7.260148,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 85,
          "name" : "Final Fantasy X-2"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "9bj0BHcB7EwehiwiuHQR",
        "_score" : 7.260148,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 79,
          "name" : "Final Fantasy XIII-2"
        }
      }
    ]
  }
}

从上面的结果中,我们可以看出来 Final Fantasy VII 是最匹配的结果。它的分数是最高的。

Script_score  查询

假如我们是游戏的运营商,那么我们也许想要自己的排名的方法。比如,虽然所有的结果都很匹配,但是我们也许不只单单是匹配 Final Fantasy,而且我们想把 user_score 和 critic_score 加进来(虽然你可以使用其中的一个)。我们想这样来算我们的分数。

代码语言:javascript复制
score = score*(user_score*10   critic_score)/2/100

也就是我们把 user_score 乘以10,从而变成100分制。它和 critic_score 加起来,然后除以2,并除以100,这样就得出来最后的分数的加权系数。这个加权系数再乘以先前在上一步得出来的分数才是最终的分数值。经过这样的改造后,我们发现我们的分数其实不光是全文搜索的相关性,同时它也紧紧地关联了我们的用户体验和游戏的难道系数。

参照 Elastics 的官方文档 script_score,我们现在做如下的搜索:

代码语言:javascript复制
GET best_games/_search
{
  "_source": [
    "name",
    "critic_score",
    "user_score"
  ],
  "query": {
    "script_score": {
      "query": {
        "match": {
          "name": "Final Fantasy"
        }
      },
      "script": {
        "source": "_score * (doc['user_score'].value*10 doc['critic_score'].value)/2/100"
      }
    }
  }
}

那么我查询后的结果为:

代码语言:javascript复制
{
  "took" : 18,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : 7.405957,
    "hits" : [
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "l7j0BHcB7EwehiwiuHMQ",
        "_score" : 7.405957,
        "_source" : {
          "user_score" : 9,
          "critic_score" : 92,
          "name" : "Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6Lj0BHcB7EwehiwiuHMQ",
        "_score" : 7.0804205,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 94,
          "name" : "Final Fantasy IX"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "pbj0BHcB7EwehiwiuHMQ",
        "_score" : 6.9990363,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 92,
          "name" : "Final Fantasy X"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "p7j0BHcB7EwehiwiuHMQ",
        "_score" : 6.917652,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 90,
          "name" : "Final Fantasy VIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Hbj0BHcB7EwehiwiuHUR",
        "_score" : 6.6328077,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Final Fantasy Tactics"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "07j0BHcB7EwehiwiuHMQ",
        "_score" : 6.592116,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 92,
          "name" : "Final Fantasy XII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Wbj0BHcB7EwehiwiuHUR",
        "_score" : 6.4700394,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 79,
          "name" : "Dissidia: Final Fantasy"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "57j0BHcB7EwehiwiuHMQ",
        "_score" : 6.225887,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 83,
          "name" : "Final Fantasy XIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "kbj0BHcB7EwehiwiuHQR",
        "_score" : 5.3406744,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Crisis Core: Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6rj0BHcB7EwehiwiuHMQ",
        "_score" : 5.2636075,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 85,
          "name" : "Final Fantasy X-2"
        }
      }
    ]
  }
}

我们从上面可以看出来最终的分数 _score 是完全不一样的值。我们同时也看出来尽管第一名的 Final Fantasy VII 没有发生变化,但是第二名的位置由 Final Fantasy X 变为 Final Fantasy IX 了。

针对 script 的运算,有一些预定义好的函数可以供我们调用,它们可以帮我们加速我们的计算。

  • Saturation
  • Sigmoid
  • Random score function
  • Decay functions for numeric fields
  • Decay functions for geo fields
  • Decay functions for date fields
  • Functions for vector fields

我们可以参考 Elastic 的官方文档来帮我们更深入地了解。

Java API

代码语言:javascript复制
//自定义排序
String scriptText = "_score *params._score   doc['rank'].value * params.rank   doc['adminlevel'].value * params.adminlevel";
Map<String, Object> params = new HashMap<>();
params.put(SCORE, 1.0f);
params.put(RANK, 0.5f);
params.put(ADMINLEVEL, 0.5 * 5000f);
Script script = new Script(ScriptType.INLINE, "painless", scriptText, params);
ScriptScoreFunctionBuilder scriptScoreFunctionBuilder = ScoreFunctionBuilders.scriptFunction(script);

FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery(boolQueryBuilder, scriptScoreFunctionBuilder)
        .scoreMode(FunctionScoreQuery.ScoreMode.SUM);

//builder
SearchSourceBuilder builder = new SearchSourceBuilder()
        .query(functionScoreQueryBuilder)
        .from(requestData.getOffset())
        .size(requestData.getLimit())
        .trackScores(true);//当使用排序_sort 

Function score 查询

function_score 允许您修改查询检索的文档分数。 例如,如果分数函数在计算上很昂贵,并且足以在过滤后的文档集上计算分数,则此功能很有用。

要使用function_score,用户必须定义一个查询和一个或多个函数,这些函数为查询返回的每个文档计算一个新分数。

function_score 可以只与一个函数一起使用,比如:

代码语言:javascript复制
GET /_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "boost": "5",
      "random_score": {},
      "boost_mode": "multiply"
    }
  }
}

这里它把所有的文档的分数由5和一个由 random_score (返回0到1之间的值)相乘而得到。那么这个分数就是一个从0到5之间的一个数值:

代码语言:javascript复制
{
  "took" : 18,
  "timed_out" : false,
  "_shards" : {
    "total" : 23,
    "successful" : 23,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 4.999982,
    "hits" : [
      {
        "_index" : "kibana_sample_data_flights",
        "_type" : "_doc",
        "_id" : "mKd573YBL1uEtTd-P1s6",
        "_score" : 4.999982,
        "_source" : {
          "FlightNum" : "L0Y1WIQ",
          "DestCountry" : "AE",
          "OriginWeather" : "Rain",
          "OriginCityName" : "Casper",
          "AvgTicketPrice" : 904.0784684352491,
          "DistanceMiles" : 7600.715985854388,
          "FlightDelay" : false,
          "DestWeather" : "Rain",
          "Dest" : "Dubai International Airport",
          "FlightDelayType" : "No Delay",
          "OriginCountry" : "US",
          "dayOfWeek" : 5,
          "DistanceKilometers" : 12232.166667538846,
          "timestamp" : "2021-01-16T07:55:29",
          "DestLocation" : {
            "lat" : "25.25279999",
            "lon" : "55.36439896"
          },
          "DestAirportID" : "DXB",
          "Carrier" : "ES-Air",
          "Cancelled" : false,
          "FlightTimeMin" : 582.4841270256593,
          "Origin" : "Casper-Natrona County International Airport",
          "OriginLocation" : {
            "lat" : "42.90800095",
            "lon" : "-106.4639969"
          },
          "DestRegion" : "SE-BD",
          "OriginAirportID" : "CPR",
          "OriginRegion" : "US-WY",
          "DestCityName" : "Dubai",
          "FlightTimeHour" : 9.708068783760988,
          "FlightDelayMin" : 0
        }
      },
      {
        "_index" : "kibana_sample_data_logs",
        "_type" : "_doc",
        "_id" : "W6cW8XYBL1uEtTd-UoJQ",
        "_score" : 4.9999275,
        "_source" : {
          "agent" : "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24",
          "bytes" : 2303,
          "clientip" : "244.6.17.108",
          "extension" : "zip",
          "geo" : {
            "srcdest" : "IN:SK",
            "src" : "IN",
            "dest" : "SK",
            "coordinates" : {
              "lat" : 42.84381889,
              "lon" : -75.56140194
            }
          },
          "host" : "artifacts.elastic.co",
          "index" : "kibana_sample_data_logs",
          "ip" : "244.6.17.108",
          "machine" : {
            "ram" : 10737418240,
            "os" : "win 8"
          },
          "memory" : null,
          "message" : "244.6.17.108 - - [2018-07-25T14:33:13.336Z] "GET /elasticsearch/elasticsearch-6.3.2.zip HTTP/1.1" 200 2303 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24"",
          "phpmemory" : null,
          "referer" : "http://www.elastic-elastic-elastic.com/success/sergey-volkov",
          "request" : "/elasticsearch/elasticsearch-6.3.2.zip",
          "response" : 200,
          "tags" : [
            "error",
            "info"
          ],
          "timestamp" : "2021-01-06T14:33:13.336Z",
          "url" : "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.2.zip",
          "utc_time" : "2021-01-06T14:33:13.336Z",
          "event" : {
            "dataset" : "sample_web_logs"
          }
        }
      },
      {
        "_index" : "kibana_sample_data_flights",
        "_type" : "_doc",
        "_id" : "FKd573YBL1uEtTd-SXW8",
        "_score" : 4.9997835,
        "_source" : {
          "FlightNum" : "GHTL47Q",
          "DestCountry" : "CN",
          "OriginWeather" : "Sunny",
          "OriginCityName" : "Vienna",
          "AvgTicketPrice" : 548.8790560351075,
          "DistanceMiles" : 4599.725604638256,
          "FlightDelay" : true,
          "DestWeather" : "Heavy Fog",
          "Dest" : "Xi'an Xianyang International Airport",
          "FlightDelayType" : "NAS Delay",
          "OriginCountry" : "AT",
          "dayOfWeek" : 5,
          "DistanceKilometers" : 7402.540803470951,
          "timestamp" : "2021-02-06T17:32:53",
          "DestLocation" : {
            "lat" : "34.447102",
            "lon" : "108.751999"
          },
          "DestAirportID" : "XIY",
          "Carrier" : "Logstash Airways",
          "Cancelled" : false,
          "FlightTimeMin" : 555.4435766747617,
          "Origin" : "Vienna International Airport",
          "OriginLocation" : {
            "lat" : "48.11029816",
            "lon" : "16.56970024"
          },
          "DestRegion" : "SE-BD",
          "OriginAirportID" : "VIE",
          "OriginRegion" : "AT-9",
          "DestCityName" : "Xi'an",
          "FlightTimeHour" : 9.257392944579362,
          "FlightDelayMin" : 120
        }
      },
      {
        "_index" : "kibana_sample_data_flights",
        "_type" : "_doc",
        "_id" : "KKd573YBL1uEtTd-QmVw",
        "_score" : 4.999776,
        "_source" : {
          "FlightNum" : "F7A2G89",
          "DestCountry" : "RU",
          "OriginWeather" : "Rain",
          "OriginCityName" : "Naples",
          "AvgTicketPrice" : 652.9709744107492,
          "DistanceMiles" : 1475.4980711052601,
          "FlightDelay" : true,
          "DestWeather" : "Cloudy",
          "Dest" : "Sheremetyevo International Airport",
          "FlightDelayType" : "Carrier Delay",
          "OriginCountry" : "IT",
          "dayOfWeek" : 6,
          "DistanceKilometers" : 2374.583967744824,
          "timestamp" : "2021-01-24T16:06:44",
          "DestLocation" : {
            "lat" : "55.972599",
            "lon" : "37.4146"
          },
          "DestAirportID" : "SVO",
          "Carrier" : "Logstash Airways",
          "Cancelled" : false,
          "FlightTimeMin" : 278.30559784965493,
          "Origin" : "Naples International Airport",
          "OriginLocation" : {
            "lat" : "40.886002",
            "lon" : "14.2908"
          },
          "DestRegion" : "RU-MOS",
          "OriginAirportID" : "NA01",
          "OriginRegion" : "IT-72",
          "DestCityName" : "Moscow",
          "FlightTimeHour" : 4.638426630827582,
          "FlightDelayMin" : 120
        }
      },
      {
        "_index" : "kibana_sample_data_logs",
        "_type" : "_doc",
        "_id" : "t6cW8XYBL1uEtTd-W5QE",
        "_score" : 4.9995403,
        "_source" : {
          "agent" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
          "bytes" : 3057,
          "clientip" : "28.86.156.19",
          "extension" : "gz",
          "geo" : {
            "srcdest" : "IN:ZA",
            "src" : "IN",
            "dest" : "ZA",
            "coordinates" : {
              "lat" : 48.29997583,
              "lon" : -112.2508711
            }
          },
          "host" : "artifacts.elastic.co",
          "index" : "kibana_sample_data_logs",
          "ip" : "28.86.156.19",
          "machine" : {
            "ram" : 10737418240,
            "os" : "osx"
          },
          "memory" : null,
          "message" : "28.86.156.19 - - [2018-08-15T11:15:47.211Z] "GET /beats/filebeat/filebeat-6.3.2-linux-x86_64.tar.gz HTTP/1.1" 200 3057 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1"",
          "phpmemory" : null,
          "referer" : "http://twitter.com/success/jack-lousma",
          "request" : "/beats/filebeat/filebeat-6.3.2-linux-x86_64.tar.gz",
          "response" : 200,
          "tags" : [
            "success",
            "info"
          ],
          "timestamp" : "2021-01-27T11:15:47.211Z",
          "url" : "https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-linux-x86_64.tar.gz",
          "utc_time" : "2021-01-27T11:15:47.211Z",
          "event" : {
            "dataset" : "sample_web_logs"
          }
        }
      },
      {
        "_index" : "kibana_sample_data_logs",
        "_type" : "_doc",
        "_id" : "SKcW8XYBL1uEtTd-UH_h",
        "_score" : 4.9994617,
        "_source" : {
          "agent" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
          "bytes" : 3010,
          "clientip" : "128.77.18.124",
          "extension" : "",
          "geo" : {
            "srcdest" : "ID:TH",
            "src" : "ID",
            "dest" : "TH",
            "coordinates" : {
              "lat" : 38.83615778,
              "lon" : -89.37841111
            }
          },
          "host" : "www.elastic.co",
          "index" : "kibana_sample_data_logs",
          "ip" : "128.77.18.124",
          "machine" : {
            "ram" : 3221225472,
            "os" : "ios"
          },
          "memory" : null,
          "message" : "128.77.18.124 - - [2018-07-22T14:26:06.237Z] "GET / HTTP/1.1" 200 3010 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"",
          "phpmemory" : null,
          "referer" : "http://facebook.com/success/sergei-ryazanski",
          "request" : "/",
          "response" : 200,
          "tags" : [
            "success",
            "info"
          ],
          "timestamp" : "2021-01-03T14:26:06.237Z",
          "url" : "https://www.elastic.co/downloads",
          "utc_time" : "2021-01-03T14:26:06.237Z",
          "event" : {
            "dataset" : "sample_web_logs"
          }
        }
      },
      {
        "_index" : "kibana_sample_data_logs",
        "_type" : "_doc",
        "_id" : "PqcW8XYBL1uEtTd-aLFz",
        "_score" : 4.9992332,
        "_source" : {
          "agent" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
          "bytes" : 2875,
          "clientip" : "217.31.123.207",
          "extension" : "gz",
          "geo" : {
            "srcdest" : "JP:KR",
            "src" : "JP",
            "dest" : "KR",
            "coordinates" : {
              "lat" : 41.8128725,
              "lon" : -85.43906111
            }
          },
          "host" : "artifacts.elastic.co",
          "index" : "kibana_sample_data_logs",
          "ip" : "217.31.123.207",
          "machine" : {
            "ram" : 10737418240,
            "os" : "win xp"
          },
          "memory" : null,
          "message" : "217.31.123.207 - - [2018-09-16T09:02:43.651Z] "GET /kibana/kibana-6.3.2-linux-x86_64.tar.gz HTTP/1.1" 200 2875 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1"",
          "phpmemory" : null,
          "referer" : "http://facebook.com/success/curt-michel",
          "request" : "/kibana/kibana-6.3.2-linux-x86_64.tar.gz",
          "response" : 200,
          "tags" : [
            "success",
            "info"
          ],
          "timestamp" : "2021-02-28T09:02:43.651Z",
          "url" : "https://artifacts.elastic.co/downloads/kibana/kibana-6.3.2-linux-x86_64.tar.gz",
          "utc_time" : "2021-02-28T09:02:43.651Z",
          "event" : {
            "dataset" : "sample_web_logs"
          }
        }
      },
      {
        "_index" : "kibana_sample_data_flights",
        "_type" : "_doc",
        "_id" : "wKd573YBL1uEtTd-QF-3",
        "_score" : 4.999192,
        "_source" : {
          "FlightNum" : "JE9IYO3",
          "DestCountry" : "GB",
          "OriginWeather" : "Clear",
          "OriginCityName" : "Osaka",
          "AvgTicketPrice" : 450.8601980103516,
          "DistanceMiles" : 5873.817368388099,
          "FlightDelay" : false,
          "DestWeather" : "Clear",
          "Dest" : "Manchester Airport",
          "FlightDelayType" : "No Delay",
          "OriginCountry" : "JP",
          "dayOfWeek" : 2,
          "DistanceKilometers" : 9452.992738911176,
          "timestamp" : "2021-01-20T21:01:45",
          "DestLocation" : {
            "lat" : "53.35369873",
            "lon" : "-2.274950027"
          },
          "DestAirportID" : "MAN",
          "Carrier" : "JetBeats",
          "Cancelled" : false,
          "FlightTimeMin" : 630.1995159274118,
          "Origin" : "Kansai International Airport",
          "OriginLocation" : {
            "lat" : "34.4272995",
            "lon" : "135.2440033"
          },
          "DestRegion" : "GB-ENG",
          "OriginAirportID" : "KIX",
          "OriginRegion" : "SE-BD",
          "DestCityName" : "Manchester",
          "FlightTimeHour" : 10.503325265456862,
          "FlightDelayMin" : 0
        }
      },
      {
        "_index" : "kibana_sample_data_logs",
        "_type" : "_doc",
        "_id" : "0acW8XYBL1uEtTd-X58a",
        "_score" : 4.998841,
        "_source" : {
          "agent" : "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
          "bytes" : 6820,
          "clientip" : "167.150.117.40",
          "extension" : "css",
          "geo" : {
            "srcdest" : "IN:CO",
            "src" : "IN",
            "dest" : "CO",
            "coordinates" : {
              "lat" : 38.28977028,
              "lon" : -94.34012694
            }
          },
          "host" : "cdn.elastic-elastic-elastic.org",
          "index" : "kibana_sample_data_logs",
          "ip" : "167.150.117.40",
          "machine" : {
            "ram" : 13958643712,
            "os" : "win 8"
          },
          "memory" : null,
          "message" : "167.150.117.40 - - [2018-08-27T11:12:07.609Z] "GET /styles/pretty-layout.css HTTP/1.1" 200 6820 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1"",
          "phpmemory" : null,
          "referer" : "http://www.elastic-elastic-elastic.com/success/steven-smith",
          "request" : "/styles/pretty-layout.css",
          "response" : 200,
          "tags" : [
            "warning",
            "security"
          ],
          "timestamp" : "2021-02-08T11:12:07.609Z",
          "url" : "https://cdn.elastic-elastic-elastic.org/styles/pretty-layout.css",
          "utc_time" : "2021-02-08T11:12:07.609Z",
          "event" : {
            "dataset" : "sample_web_logs"
          }
        }
      },
      {
        "_index" : "kibana_sample_data_flights",
        "_type" : "_doc",
        "_id" : "wqd573YBL1uEtTd-Q2Y-",
        "_score" : 4.9988403,
        "_source" : {
          "FlightNum" : "DOP3F40",
          "DestCountry" : "CN",
          "OriginWeather" : "Sunny",
          "OriginCityName" : "Hyderabad",
          "AvgTicketPrice" : 716.3871123084598,
          "DistanceMiles" : 2215.9250825297995,
          "FlightDelay" : false,
          "DestWeather" : "Sunny",
          "Dest" : "Xi'an Xianyang International Airport",
          "FlightDelayType" : "No Delay",
          "OriginCountry" : "IN",
          "dayOfWeek" : 1,
          "DistanceKilometers" : 3566.185736018838,
          "timestamp" : "2021-01-26T22:54:39",
          "DestLocation" : {
            "lat" : "34.447102",
            "lon" : "108.751999"
          },
          "DestAirportID" : "XIY",
          "Carrier" : "Logstash Airways",
          "Cancelled" : false,
          "FlightTimeMin" : 222.88660850117736,
          "Origin" : "Rajiv Gandhi International Airport",
          "OriginLocation" : {
            "lat" : "17.23131752",
            "lon" : "78.42985535"
          },
          "DestRegion" : "SE-BD",
          "OriginAirportID" : "HYD",
          "OriginRegion" : "SE-BD",
          "DestCityName" : "Xi'an",
          "FlightTimeHour" : 3.714776808352956,
          "FlightDelayMin" : 0
        }
      }
    ]
  }
}

尽管这个分数没有多大实际的意思,但是它可以让我们每次进入一个网页看到不同的文档,而不是严格按照固定的匹配而得到的固定的结果。

我们也可以配合 script_score 一起来使用 function_score:

代码语言:javascript复制
GET best_games/_search
{
  "_source": [
    "name",
    "critic_score",
    "user_score"
  ],
  "query": {
    "function_score": {
      "query": {
        "match": {
          "name": "Final Fantasy"
        }
      },
      "script_score": {
        "script": "_score * (doc['user_score'].value*10 doc['critic_score'].value)/2/100"
      }
    }
  }
}

那么显示的结果是:

代码语言:javascript复制
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : 60.272747,
    "hits" : [
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "l7j0BHcB7EwehiwiuHMQ",
        "_score" : 60.272747,
        "_source" : {
          "user_score" : 9,
          "critic_score" : 92,
          "name" : "Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6Lj0BHcB7EwehiwiuHMQ",
        "_score" : 57.623398,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 94,
          "name" : "Final Fantasy IX"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "pbj0BHcB7EwehiwiuHMQ",
        "_score" : 56.96106,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 92,
          "name" : "Final Fantasy X"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "p7j0BHcB7EwehiwiuHMQ",
        "_score" : 56.29872,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 90,
          "name" : "Final Fantasy VIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Hbj0BHcB7EwehiwiuHUR",
        "_score" : 53.980537,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Final Fantasy Tactics"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "07j0BHcB7EwehiwiuHMQ",
        "_score" : 53.64937,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 92,
          "name" : "Final Fantasy XII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Wbj0BHcB7EwehiwiuHUR",
        "_score" : 52.65586,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 79,
          "name" : "Dissidia: Final Fantasy"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "57j0BHcB7EwehiwiuHMQ",
        "_score" : 50.66885,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 83,
          "name" : "Final Fantasy XIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6rj0BHcB7EwehiwiuHMQ",
        "_score" : 38.21457,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 85,
          "name" : "Final Fantasy X-2"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "9bj0BHcB7EwehiwiuHQR",
        "_score" : 36.633278,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 79,
          "name" : "Final Fantasy XIII-2"
        }
      }
    ]
  }
}

细心的读者可能看出来了。我们的分数和之前的那个 soft_score 结果是不一样的,但是我们搜索的结果的排序是一样的。

在上面的 script 的写法中,我们使用了硬编码,也就是把10硬写入到 script 中了。假如有一种情况,我将来想修改这个值为20或其它的值,重新看看查询的结果。由于 script 的改变,需要重新进行编译,这样的效率并不高。一种较好的办法是如下的写法:

代码语言:javascript复制
GET best_games/_search
{
  "_source": [
    "name",
    "critic_score",
    "user_score"
  ],
  "query": {
    "script_score": {
      "query": {
        "match": {
          "name": "Final Fantasy"
        }
      },
      "script": {
        "params": {
          "multiplier": 10
        },
        "source": "_score * (doc['user_score'].value*params.multiplier doc['critic_score'].value)/2/100"
      }
    }
  }
}

脚本编译被缓存以加快执行速度。 如果脚本具有需要考虑的参数,则最好重用相同的脚本并为其提供参数。

boost_mode

boost_mode是用来定义最新计算出来的分数如何和查询的分数来相结合的。

类别

解释

mulitply

查询分数和功能分数相乘(默认)

replace

仅使用功能分数,查询分数将被忽略

sum

查询分数和功能分数相加

avg

平均值

max

查询分数和功能分数的最大值

min

查询分数和功能分数的最小值

field_value_factor

field_value_factor 函数使您可以使用文档中的字段来影响得分。 与使用 script_score 函数类似,但是它避免了脚本编写的开销。 如果用于多值字段,则在计算中仅使用该字段的第一个值。

例如,假设您有一个用数字 likes 字段索引的文档,并希望通过该字段影响文档的得分,那么这样做的示例如下所示:

代码语言:javascript复制
GET /_search
{
  "query": {
    "function_score": {
      "field_value_factor": {
        "field": "likes",
        "factor": 1.2,
        "modifier": "sqrt",
        "missing": 1
      }
    }
  }
}

上面的 function_score 将根据 field_value_factore 按照如下的方式来计算分数:

代码语言:javascript复制
sqrt(1.2 * doc['likes'].value)

field_value_factor 函数有许多选项:

选项

解释

field

要从文档中提取的字段。

factor

字段值乘以的可选因子,默认为1

modifier

应用于字段值的修饰符可以是以下之一:none,log,log1p,log2p,ln,ln1p,ln2p,square,sqrt 或 reciprocal。 默认为无

missing

如果文档没有该字段,则使用该值。 就像从文档中读取一样,修饰符和因数仍然适用于它

针对我们的例子,我们也可以使用如下的方法来重新计算分数:

代码语言:javascript复制
GET best_games/_search
{
  "_source": [
    "name",
    "critic_score",
    "user_score"
  ],
  "query": {
    "function_score": {
      "query": {
        "match": {
          "name": "Final Fantasy"
        }
      },
      "field_value_factor": {
        "field": "user_score",
        "factor": 1.2,
        "modifier": "none",
        "missing": 1
      }
    }
  }
}

在上面的例子里,我们使用 user_score 字段,并把这个字段的 factor 设置为1.2。这样加大这个字段的重要性。重新进行搜索:

代码语言:javascript复制
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : 87.89488,
    "hits" : [
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "l7j0BHcB7EwehiwiuHMQ",
        "_score" : 87.89488,
        "_source" : {
          "user_score" : 9,
          "critic_score" : 92,
          "name" : "Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "pbj0BHcB7EwehiwiuHMQ",
        "_score" : 78.128784,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 92,
          "name" : "Final Fantasy X"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "p7j0BHcB7EwehiwiuHMQ",
        "_score" : 78.128784,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 90,
          "name" : "Final Fantasy VIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6Lj0BHcB7EwehiwiuHMQ",
        "_score" : 78.128784,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 94,
          "name" : "Final Fantasy IX"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Hbj0BHcB7EwehiwiuHUR",
        "_score" : 78.128784,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Final Fantasy Tactics"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Wbj0BHcB7EwehiwiuHUR",
        "_score" : 78.128784,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 79,
          "name" : "Dissidia: Final Fantasy"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "07j0BHcB7EwehiwiuHMQ",
        "_score" : 68.362686,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 92,
          "name" : "Final Fantasy XII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "57j0BHcB7EwehiwiuHMQ",
        "_score" : 68.362686,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 83,
          "name" : "Final Fantasy XIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "kbj0BHcB7EwehiwiuHQR",
        "_score" : 62.908558,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Crisis Core: Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6rj0BHcB7EwehiwiuHMQ",
        "_score" : 52.273067,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 85,
          "name" : "Final Fantasy X-2"
        }
      }
    ]
  }
}

我们可以看出来我们的分数又有些变化。而且排序也有变化。

functions

上面的例子中,每一个 doc 都会乘以相同的系数,有时候我们需要对不同的 doc 采用不同的权重。这时,使用 functions 是一种不错的选择。几个 function 可以组合。 在这种情况下,可以选择仅在文档与给定的过滤查询匹配时才应用该 function:

代码语言:javascript复制
GET /_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "boost": "5",
      "functions": [
        {
          "filter": {
            "match": {
              "test": "bar"
            }
          },
          "random_score": {},
          "weight": 23
        },
        {
          "filter": {
            "match": {
              "test": "cat"
            }
          },
          "weight": 42
        }
      ],
      "max_boost": 42,
      "score_mode": "max",
      "boost_mode": "multiply",
      "min_score": 42
    }
  }
}

上面的 boost 为5,也即所有的文档的加权都是5。我们同时也看到几个定义的 functions。它们是针对相应的匹配的文档分别进行加权的。如果匹配了,就可以乘以相应的加权。

针对我们的例子,我们也可以做如下的实验。

代码语言:javascript复制
GET best_games/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "name": "Final Fantasy"
        }
      },
      "boost": "1",
      "functions": [
        {
          "filter": {
            "match": {
              "name": " XIII"
            }
          },
          "weight": 10000000
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

我们想把 name 含有 XIII 的所有游戏都加一个权。这样它可以排到最前面。我们给它的加权值很大:10000000。

搜索后的结果是:

代码语言:javascript复制
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : 8.1384144E7,
    "hits" : [
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "57j0BHcB7EwehiwiuHMQ",
        "_score" : 8.1384144E7,
        "_source" : {
          "global_sales" : 5.33,
          "year" : 2009,
          "image_url" : "https://www.wired.com/images_blogs/gamelife/2009/09/ffxiii-01.jpg",
          "platform" : "PS3",
          "@timestamp" : "2009-01-01T00:00:00.000 08:00",
          "user_score" : 7,
          "critic_score" : 83,
          "name" : "Final Fantasy XIII",
          "genre" : "Role-Playing",
          "publisher" : "Square Enix",
          "developer" : "Square Enix",
          "id" : "final-fantasy-xiii-ps3-2009"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "9bj0BHcB7EwehiwiuHQR",
        "_score" : 7.260148E7,
        "_source" : {
          "global_sales" : 2.63,
          "year" : 2011,
          "image_url" : "https://i.ytimg.com/vi/tSJH_vhaYUk/maxresdefault.jpg",
          "platform" : "PS3",
          "@timestamp" : "2011-01-01T00:00:00.000 08:00",
          "user_score" : 6,
          "critic_score" : 79,
          "name" : "Final Fantasy XIII-2",
          "genre" : "Role-Playing",
          "publisher" : "Square Enix",
          "developer" : "Square Enix",
          "id" : "final-fantasy-xiii-2-ps3-2011"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "l7j0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "global_sales" : 9.72,
          "year" : 1997,
          "image_url" : "https://r.hswstatic.com/w_907/gif/finalfantasyvii-MAIN.jpg",
          "platform" : "PS",
          "@timestamp" : "1997-01-01T00:00:00.000 08:00",
          "user_score" : 9,
          "critic_score" : 92,
          "name" : "Final Fantasy VII",
          "genre" : "Role-Playing",
          "publisher" : "Sony Computer Entertainment",
          "developer" : "SquareSoft",
          "id" : "final-fantasy-vii-ps-1997"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "pbj0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "global_sales" : 8.05,
          "year" : 2001,
          "image_url" : "https://www.mobygames.com/images/promo/l/192477-final-fantasy-x-screenshot.jpg",
          "platform" : "PS2",
          "@timestamp" : "2001-01-01T00:00:00.000 08:00",
          "user_score" : 8,
          "critic_score" : 92,
          "name" : "Final Fantasy X",
          "genre" : "Role-Playing",
          "publisher" : "Sony Computer Entertainment",
          "developer" : "SquareSoft",
          "id" : "final-fantasy-x-ps2-2001"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "p7j0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "global_sales" : 7.86,
          "year" : 1999,
          "image_url" : "https://gamingheartscollection.files.wordpress.com/2018/02/final-fantasy-8.png?w=585",
          "platform" : "PS",
          "@timestamp" : "1999-01-01T00:00:00.000 08:00",
          "user_score" : 8,
          "critic_score" : 90,
          "name" : "Final Fantasy VIII",
          "genre" : "Role-Playing",
          "publisher" : "SquareSoft",
          "developer" : "SquareSoft",
          "id" : "final-fantasy-viii-ps-1999"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "07j0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "global_sales" : 5.95,
          "year" : 2006,
          "image_url" : "https://m.media-amazon.com/images/M/MV5BM2I4MDMyMDQtNjM2OC00ZWNkLTg0ODQtNzYxZjY0M2QxODQyXkEyXkFqcGdeQXVyNjY5NTM5MjA@._V1_.jpg",
          "platform" : "PS2",
          "@timestamp" : "2006-01-01T00:00:00.000 08:00",
          "user_score" : 7,
          "critic_score" : 92,
          "name" : "Final Fantasy XII",
          "genre" : "Role-Playing",
          "publisher" : "Square Enix",
          "developer" : "Square Enix",
          "id" : "final-fantasy-xii-ps2-2006"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6Lj0BHcB7EwehiwiuHMQ",
        "_score" : 8.138414,
        "_source" : {
          "global_sales" : 5.3,
          "year" : 2000,
          "image_url" : "http://gamesdatabase.org/Media/SYSTEM/Sony_Playstation/Snap/Thumb/Thumb_Final_Fantasy_IX_-_2000_-_Square_Co.,_Ltd..jpg",
          "platform" : "PS",
          "@timestamp" : "2000-01-01T00:00:00.000 08:00",
          "user_score" : 8,
          "critic_score" : 94,
          "name" : "Final Fantasy IX",
          "genre" : "Role-Playing",
          "publisher" : "SquareSoft",
          "developer" : "SquareSoft",
          "id" : "final-fantasy-ix-ps-2000"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Hbj0BHcB7EwehiwiuHUR",
        "_score" : 8.138414,
        "_source" : {
          "global_sales" : 2.45,
          "year" : 1997,
          "image_url" : "https://www.thefinalfantasy.com/gallery/screenshots/ff-tactics/dynamic_previews/ff-tactics-screenshot-1_scale_800_700.jpg",
          "platform" : "PS",
          "@timestamp" : "1997-01-01T00:00:00.000 08:00",
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Final Fantasy Tactics",
          "genre" : "Role-Playing",
          "publisher" : "SquareSoft",
          "developer" : "SquareSoft",
          "id" : "final-fantasy-tactics-ps-1997"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Wbj0BHcB7EwehiwiuHUR",
        "_score" : 8.138414,
        "_source" : {
          "global_sales" : 2.23,
          "year" : 2008,
          "image_url" : "https://i.ytimg.com/vi/aoTdB0WTXX4/hqdefault.jpg",
          "platform" : "PSP",
          "@timestamp" : "2008-01-01T00:00:00.000 08:00",
          "user_score" : 8,
          "critic_score" : 79,
          "name" : "Dissidia: Final Fantasy",
          "genre" : "Fighting",
          "publisher" : "Square Enix",
          "developer" : "Square Enix",
          "id" : "dissidia-final-fantasy-psp-2008"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6rj0BHcB7EwehiwiuHMQ",
        "_score" : 7.260148,
        "_source" : {
          "global_sales" : 5.29,
          "year" : 2003,
          "image_url" : "https://upload.wikimedia.org/wikipedia/en/thumb/6/6c/FFX-2_box.jpg/220px-FFX-2_box.jpg",
          "platform" : "PS2",
          "@timestamp" : "2003-01-01T00:00:00.000 08:00",
          "user_score" : 6,
          "critic_score" : 85,
          "name" : "Final Fantasy X-2",
          "genre" : "Role-Playing",
          "publisher" : "Electronic Arts",
          "developer" : "SquareSoft",
          "id" : "final-fantasy-x-2-ps2-2003"
        }
      }
    ]
  }
}

我们可以看出来,在这一次的搜索中 Final Fantasy XIII 的排名变成第一了。

衰变函数

Function 评分技术不仅可以修改默认的 Elasticsearch 评分算法,还可以用于完全替代它。 一个很好的例子是 “trending” 搜索,显示主题中正在迅速流行的项目。

这样的分数不能基于简单的指标(例如“喜欢”或“观看次数”),而必须根据当前时间不断调整。 与在24小时内获得10000次观看的视频相比,在1小时内获得1000次观看的视频通常被认为“更热”。 Elasticsearch 附带了几个衰减函数,这些函数使解决此类问题变得轻而易举。

我们现在以 gauss 来为例展示如何使用这个衰变函数的。曲线的形状可以通过 orgin,scale,offset 和 decay 来控制。 这三个变量是控制曲线形状的主要工具。 可以将 origin 和 scale 参数视为您的最小值和最大值,它定义了将在其中定义曲线的边界框。 如果我们希望趋势视频列表涵盖一整天,则最好将原点定义为当前时间戳,比例尺定义为24小时。 offset 可用于在开始时将曲线完全平坦,例如将其设置为1h,可消除最近视频的所有惩罚,也即最近1个小时里的所有视频不受影响 。最后,衰减选项会根据文档的位置更改文档降级的严重程度。 默认的衰减值是0.5,较大的值会使曲线更陡峭,其效果也更明显。

我们还是拿我们的 best_games 来为例:

代码语言:javascript复制
GET best_games/_search
{
  "_source": [
    "name",
    "critic_score",
    "user_score"
  ],
  "query": {
    "function_score": {
      "query": {
        "match": {
          "name": "Final Fantasy"
        }
      },
      "functions": [
        {
          "gauss": {
            "@timestamp": {
              "origin": "2016-01-01T00:00:00",
              "scale": "365d",
              "offset": "0h",
              "decay": 0.1
            }
          }
        }
      ],
      "boost_mode": "multiply"
    }
  }
}

上面的查询是基于2016-01-01这一天开始,在365天之内的文档不收衰减,那么超过这个时间的所有文档,衰减的加权值为0.1。也就是说1年开外的所有文档对我的意义并不是太多。

重新运行我们的查询,结果显示:

代码语言:javascript复制
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : 6.67425E-25,
    "hits" : [
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "9bj0BHcB7EwehiwiuHQR",
        "_score" : 6.67425E-25,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 79,
          "name" : "Final Fantasy XIII-2"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "l7j0BHcB7EwehiwiuHMQ",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 9,
          "critic_score" : 92,
          "name" : "Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "pbj0BHcB7EwehiwiuHMQ",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 92,
          "name" : "Final Fantasy X"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "p7j0BHcB7EwehiwiuHMQ",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 90,
          "name" : "Final Fantasy VIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "07j0BHcB7EwehiwiuHMQ",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 92,
          "name" : "Final Fantasy XII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "57j0BHcB7EwehiwiuHMQ",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 7,
          "critic_score" : 83,
          "name" : "Final Fantasy XIII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6Lj0BHcB7EwehiwiuHMQ",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 94,
          "name" : "Final Fantasy IX"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "6rj0BHcB7EwehiwiuHMQ",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 6,
          "critic_score" : 85,
          "name" : "Final Fantasy X-2"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "kbj0BHcB7EwehiwiuHQR",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Crisis Core: Final Fantasy VII"
        }
      },
      {
        "_index" : "best_games",
        "_type" : "_doc",
        "_id" : "Hbj0BHcB7EwehiwiuHUR",
        "_score" : 0.0,
        "_source" : {
          "user_score" : 8,
          "critic_score" : 83,
          "name" : "Final Fantasy Tactics"
        }
      }
    ]
  }
}

这次的搜索结果显示 Final Fantasy XIII-2 是得分最高的文档。

Java API

代码语言:javascript复制
FunctionScoreQueryBuilder query = QueryBuilders.functionScoreQuery(boolQueryBuilder,
        ScoreFunctionBuilders.fieldValueFactorFunction("name")
                .modifier(FieldValueFactorFunction.Modifier.NONE)
                .factor(2f)).scoreMode(FunctionScoreQuery.ScoreMode.SUM);


FieldValueFactorFunctionBuilder fieldQuery = new FieldValueFactorFunctionBuilder("name");
fieldQuery.setWeight(0.035f).missing(1).factor(2f);
fieldQuery.modifier(FieldValueFactorFunction.Modifier.NONE);
// 最终分数=_score 额外分数
FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders
        .functionScoreQuery(fieldQuery)
        .boostMode(CombineFunction.SUM).maxBoost(5f);

QueryRescorerBuilder rescorerBuilder = new QueryRescorerBuilder(functionScoreQueryBuilder);
rescorerBuilder.windowSize(500);

SearchSourceBuilder builder = new SearchSourceBuilder();
builder.size(limit).from(offset);
builder.query(boolQueryBuilder);
builder.addRescorer(rescorerBuilder);

参考:

【1】https://www.elastic.co/blog/found-function-scoring

【2】https://medium.com/horrible-hacks/customizing-scores-in-elasticsearch-for-product-recommendations-9e0d02ce1dbd

【3】https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor

【4】https://juejin.im/post/5df8f465518825123751c089

【5】https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-score-query.html

0 人点赞