记录一次 Elasticsearch 聚合结果不准确的问题

2023-11-23 14:44:16 浏览数 (1)

问题

agg聚合函数查询后,返回 bucket 中的值都是相同的

使用的代码如下:

代码语言:javascript复制
TermsAggregationBuilder terms1 = AggregationBuilders.terms("brands_max_num").field("equipCode");
TermsAggregationBuilder terms2 = AggregationBuilders.terms("timeCount").field("receivedTimeStamp");

        SearchSourceBuilder querySourceBuilder = new SearchSourceBuilder().aggregation(terms1.subAggregation(terms2)).size(0);

通常在kibana上也能复现:

代码语言:javascript复制
GET ecc_bc_20211017/_search
{
  "query": {
    "match": {
      "gpsId": "31854137"
    }
  },
  "size": 0,
  "aggs": {
    "brands_max_num": {
      "terms": {
        "field": "equipCode"
      },
      "aggs": {
        "timeCount": {
          "terms": {
            "field": "receivedTimeStamp"
          }
        }
      }
    }
  }
}

原因分析

agg 函数默认是分10个bucket ,在数据量大的情况下就会不准确。

解决方法

增加 bucket 的个数

代码语言:javascript复制
GET ecc_bc_20211017/_search
{
  "query": {
    "match": {
      "gpsId": "31854137"
    }
  },
  "size": 0,
  "aggs": {
    "brands_max_num": {
      "terms": {
        "field": "equipCode"
      },
      "aggs": {
        "timeCount": {
          "terms": {
            "field": "receivedTimeStamp",
            "size": 100
          }
        }
      }
    }
  }
}

Java 在构建TermsAggregationBuilder 时指定size

代码语言:javascript复制
TermsAggregationBuilder terms2 = AggregationBuilders.terms("timeCount").field("receivedTimeStamp").size(100); 

我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!

邀请人:岳涛,社区ID:7348459

0 人点赞