elasticsearch分页获取数据

2022-03-29 14:44:20 浏览数 (1)

提到elasticsearch分页,可能首先想到的是类似mysql的那种处理方式,传入分页起始值以及每页数据量,es确实提供了类似的处理策略,代码如下:

代码语言:javascript复制
	@Test
	public void searchFromSize() throws IOException{
		SearchRequest searchRequest = new SearchRequest("sub_bank1120");
		SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
		searchSourceBuilder.query(QueryBuilders.matchAllQuery());
        //每页10个数据
		searchSourceBuilder.size(10);
        //起始位置从10开始
		searchSourceBuilder.from(10);
		searchRequest.source(searchSourceBuilder);
		SearchResponse searchResponse = highLevelClient.search(searchRequest, RequestOptions.DEFAULT); 
		SearchHit[] searchHits = searchResponse.getHits().getHits();
	    for(SearchHit s:searchHits){
	    	println(s.getSourceAsString());
	    }
	}	

但是上述方式有一个严重的缺陷:from和size不能太大,两者之和不能超过index.max_result_window,超过该值就会报

org.elasticsearch.client.ResponseException异常

Result window is too large, from size must be less than or equal to: [10000] but was [11010]

为什么会使用index.max_result_window来限制搜索深度,因为这需要耗费大量内存,比如from为10000,es会按照一定的顺序从每个分片读取10010个数据,然后取出每个分片中排序前10的数据返回给协调节点,协调节点会将从所有分片节点返回的10条数据再次进行统一排序处理,以此来返回全局排序前10的数据,如果有类似的需要可以使用scroll以及search after来实现超大分页问题,

scroll分页示例代码可以参考:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/6.8/java-rest-high-search-scroll.html

search after示例可以参考下面代码:

代码语言:javascript复制
	/**
	 * search after
	 * @throws IOException
	 */
	@Test
	public void searchAfter() throws IOException{
		SearchRequest searchRequest = new SearchRequest("sub_bank1031");
		SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
		searchSourceBuilder.query(QueryBuilders.matchQuery("cityId", "511000"));
		searchSourceBuilder.size(2);
        //id动态映射为text类型,排序不能使用分词的字段,所以这里选择了id的keyword多字段属性
		searchSourceBuilder.sort(new FieldSortBuilder("id.keyword").order(SortOrder.ASC));
		//
		searchRequest.source(searchSourceBuilder);
		SearchResponse searchResponse = highLevelClient.search(searchRequest, RequestOptions.DEFAULT); 
		SearchHit[] searchHits = searchResponse.getHits().getHits();
		if(searchHits.length >0){
		    for(SearchHit s:searchHits){
		    	println(s.getSourceAsString());
		    }
		    JSONObject json = JSON.parseObject(searchHits[searchHits.length-1].getSourceAsString());
		    String id = json.getString("id");
		    searchSourceBuilder.searchAfter(new Object[]{id});
		    searchRequest.source(searchSourceBuilder);
		    searchResponse = highLevelClient.search(searchRequest, RequestOptions.DEFAULT); 
		    for(SearchHit s:searchHits){
		    	println(s.getSourceAsString());
		    }		    
		}
	}	

下图为索引映射部分截图:

PS:

search after与scroll相比简单些,而且无状态。

0 人点赞