简介
父子文档在理解上来说,可以理解为一个关联查询,有些类似MySQL中的JOIN查询,通过某个字段关系来关联。父子文档与嵌套文档主要的区别在于,父子文档的父对象和子对象都是独立的文档,而嵌套文档中都在同一个文档中存储。如下图所示:
构建父-子索引
代码语言:javascript复制新建Setting:
PUT /test_doctor
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"index_ansj_analyzer": {
"type": "custom",
"tokenizer": "index_ansj",
"filter": [
"my_synonym",
"asciifolding"
]
},
"comma": {
"type": "pattern",
"pattern": ","
},
"shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle_filter"
]
}
},
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms_path": "analysis/synonym.txt"
},
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 2,
"output_unigrams": false
}
}
}
}
}
新建Mapping:
PUT /test_doctor/_mapping/_doc
{
"_doc": {
"properties": {
"date": {
"type": "date"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"comment": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"age": {
"type": "long"
},
"body": {
"type": "text",
"analyzer":"index_ansj_analyzer"
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"title": {
"type": "text",
"analyzer":"index_ansj_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"relation": { # 这个relation相当于一个普通的字段名
"type": "join",
"relations": { # 该relations部分定义了文档内的一组可能的关系,每个关系是父名和子名
"question": "answer"
}
}
}
}
}
备注:question和answer是自定义的一种关系
这段代码建立了一个test_doctor的索引,其中relation是一个用于join的字段,type为join,关系relations为:父为question, 子为answer。 至于建立一父多子关系,只需要改为数组即可:"question":["answer","comment"]
插入数据
代码语言:javascript复制插入父文档数据,需要指定上文索引结构中的relation为question
PUT test_doctor/_doc/1
{
"title":"这是一篇文章",
"body":"这是一篇文章,从哪里说起呢? ... ...",
"relation":"question" # 这个relation是一个普通的字段,value值为question表示为父文档
}
PUT test_doctor/_doc/2
{
"title":"这是一篇小说",
"body":"这是一篇小说,从哪里说起呢? ... ...",
"relation":"question" # 这个relation是一个普通的字段,value值为question表示为父文档
}
注意也可以写成这样"relation":{"name":"question"}
插入子文档,需要在请求地址上使用routing参数指定是谁的子文档,并且指定索引结构中的relation关系
PUT test_doctor/_doc/3?routing=1
{
"name":"张三",
"comment":"写的不错",
"age":28,
"date":"2020-05-04",
"relation":{ # 这个relation是一个普通的字段,value值为answer表示为子文档
"name":"answer",
"parent":1
}
}
PUT test_doctor/_doc/4?routing=1
{
"name":"李四",
"comment":"写的很好",
"age":20,
"date":"2020-05-04",
"relation":{ # 这个relation是一个普通的字段,value值为answer表示为子文档
"name":"answer",
"parent":1
}
}
PUT test_doctor/_doc/5?routing=2
{
"name":"王五",
"comment":"这是一篇非常棒的小说",
"age":31,
"date":"2020-05-01",
"relation":{ # 这个relation是一个普通的字段,value值为answer表示为子文档
"name":"answer",
"parent":2
}
}
PUT test_doctor/_doc/6?routing=2
{
"name":"小六",
"comment":"这是一篇非常棒的小说",
"age":31,
"date":"2020-05-01",
"relation":{ # 这个relation是一个普通的字段,value值为answer表示为子文档
"name":"answer",
"parent":2
}
}
父文档:
Map drugMap = Maps.newHashMap();
drugMap.put("id", "2"); //
drugMap.put("title", "这是一篇小说"); //
drugMap.put("body", "这是一篇小说,从哪里说起呢? ... ...");
drugMap.put("relation", "question");// 固定写法
子文档:
Map maps = Maps.newHashMap();
maps.put("name", "answer"); // 固定写法
maps.put("parent", "2"); // 这里的2是指的父文档所绑定的id
Map doctorTeamMap = Maps.newHashMap();
doctorTeamMap.put("id", "6");
doctorTeamMap.put("name", "小六");
doctorTeamMap.put("comment", "这是一篇非常棒的小说");
doctorTeamMap.put("age", "31");
doctorTeamMap.put("date", "2020-05-01");
doctorTeamMap.put("relation", maps); // 固定写法
Java代码实现:
/**
* 使用BulkProcessor批量更新数据
* @param indexName 索引名称
* @param jsonString 索引的document数据
*/
public boolean addIndexBulk(String indexName, Map<String, Object> jsonString, String id) {
IndexRequest request = new IndexRequest(indexName, "_doc", id);
request.source(jsonString, XContentType.JSON);
dataBulkProcessor.add(request);
return true;
}
/**
* 添加路由
*/
public boolean addIndexBulk(String indexName, Map<String, Object> jsonString, String id, String routing) {
IndexRequest request = new IndexRequest(indexName, "_doc", id);
request.source(jsonString, XContentType.JSON);
request.routing(routing);
dataBulkProcessor.add(request);
return true;
}
查询数据
关系字段查询
Es会自动生成一个额外的用于表示关系的字段:field#question
代码语言:javascript复制我们可以通过以下方式查询:
POST test_doctor/_search
{
"script_fields": {
"parent": {
"script": {
"source": "doc['relation#question']"
}
}
}
}
响应结果:
{
"took" : 124,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 7,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_doctor",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"parent" : [
"1"
]
}
},
{
"_index" : "test_doctor",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_routing" : "1",
"fields" : {
"parent" : [
"1"
]
}
},
{
"_index" : "test_doctor",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_routing" : "1",
"fields" : {
"parent" : [
"1"
]
}
},
{
"_index" : "test_doctor",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_routing" : "1",
"fields" : {
"parent" : [
"1"
]
}
},
{
"_index" : "test_doctor",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"fields" : {
"parent" : [
"5"
]
}
},
{
"_index" : "test_doctor",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_routing" : "5",
"fields" : {
"parent" : [
"5"
]
}
},
{
"_index" : "test_doctor",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_routing" : "1",
"fields" : {
"parent" : [
"1"
]
}
}
]
}
}
有_routing字段的说明是子文档,它的parent字段是父文档id,如果没有_routing就是父文档,它的parent指向当前id。
通过parent_id查询子文档
代码语言:javascript复制通过parent_id query传入父文档id即可
POST test_doctor/_search
{
"query": {
"parent_id": {
"type": "answer",
"id": "5"
}
}
}
Java API:
//子文档名
String child_type = "answer";
//父文档ID
String id = "5";
//ParentId查询
ParentIdQueryBuilder parentIdQueryBuilder = new ParentIdQueryBuilder(child_type, id);
builder.query(parentIdQueryBuilder);
builder.from(0);
builder.size(10);
通过ID和routing ,访问子文档(不加routing查不到)
GetRequest getRequest = new GetRequest(indexName, child_type);
//必须指定路由(父ID)
getRequest.routing(id);
通过子文档查询has_child
使用has_child来根据子文档内容查询父文档,其实type就是创建文档时,子文档的标识。
代码语言:javascript复制查询包含特定子文档的父文档,这是一种很耗性能的查询,尽量少用。它的查询标准格式如下
POST test_doctor/_search
{
"query": {
"has_child": {
"type": "answer",
"query": {
"match": {
"name": "张三"
}
},
"inner_hits": {} # 同时返回父子数据
}
}
}
POST test_doctor/_search
{
"query": {
"has_child" : {
"type" : "answer",
"query" : {
"match_all" : {}
},
"max_children": 10, //可选,符合查询条件的子文档最大返回数
"min_children": 2, //可选,符合查询条件的子文档最小返回数
"score_mode" : "min"
}
}
}
如果也想根据父文档的字段进行过滤,采用后置过滤器的方法
POST test_doctor/_search
{
"query": {
"has_child": {
"type": "answer",
"query": {
"match": {
"name": "张三"
}
},
"inner_hits": {}
}
},
"post_filter": {
"bool": {
"must": [
{
"term": {
"title": {
"value": "文章",
"boost": 1
}
}
}
]
}
}
}
Java API:
// 子文档查询条件
QueryBuilder matchQuery = QueryBuilders.termQuery("name", "张三");
// 是否计算评分
ScoreMode scoreMode = ScoreMode.Total;
HasChildQueryBuilder childQueryBuilder = new HasChildQueryBuilder("answer", matchQuery, scoreMode);
childQueryBuilder.innerHit(new InnerHitBuilder());
builder.query(childQueryBuilder);
builder.postFilter(boolQueryBuilder);
通过父文档查询has_parent
根据父文档查询子文档 has_parent。
代码语言:javascript复制{
"query": {
"has_parent": {
"parent_type":"question",
"query": {
"match": {
"title": "这是一篇文章"
}
}
}
}
}
// 是否计算评分
score = true;
HasParentQueryBuilder hasParentQueryBuilder = new HasParentQueryBuilder("question", boolQueryBuilder, score);
builder.query(hasParentQueryBuilder);
builder.postFilter(QueryBuilders.termQuery("indextype", "answer")); // 子文档的过滤条件