ElasticSearch7.2 父子文档

写这篇文章的目的是为了帮助大家了解7.2版本中的父子文档,之前希望通过百度的博客快速了解一下,然而大失所望,建立索引的语法在7.2版本没有一个能通过,决定仔细看一遍官方文档

建立父-子文档语法

首先看一下如何建立父子文档,明显和网上”_parent”的方式不一样,说明es后期版本已经修改了语法

1
2
3
4
5
6
7
8
9
10
11
12
13
PUT my_index
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}

这段代码建立了一个my_index的索引,其中my_join_field是一个用于join的字段,type为join,关系relations为:父为question, 子为answer
至于建立一父多子关系,只需要改为数组即可:"question": ["answer", "comment"]

插入数据

插入两个父文档,语法如下

1
2
3
4
5
6
7
PUT my_index/_doc/1?refresh
{
"text": "This is a question",
"my_join_field": {
"name": "question"
}
}

同时也可以省略name

1
2
3
4
5
PUT my_index/_doc/1?refresh
{
"text": "This is a question",
"my_join_field": "question"
}

插入子文档

子文档的插入语法如下,注意routing是父文档的id,平时我们插入文档时routing的默认就是id
此时name为answer,表示这是个子文档

1
2
3
4
5
6
7
PUT /my_index/_doc/3?routing=1
{
"text": "This is an answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}

通过parent_id查询子文档

通过parent_id query传入父文档id即可

1
2
3
4
5
6
7
8
9
GET my_index/_search
{
"query": {
"parent_id": {
"type": "answer",
"id": "1"
}
}
}

父-子文档的性能及限制性

父-子文档主要适用于一对多的实体关系,将其反范式存入文档中

父-子文档主要由以下特性:

  • Only one join field mapping is allowed per index.
    每个索引只能有一个join字段
  • Parent and child documents must be indexed on the same shard. This means that the same routing value needs to be provided when getting, deleting, or updating a child document.
    父-子文档必须在同一个分片上,也就是说增删改查一个子文档,必须使用和父文档一样的routing key(默认是id)
  • An element can have multiple children but only one parent.
    每个元素可以有多个子,但只有一个父
  • It is possible to add a new relation to an existing join field.
    可以为一个已存在的join字段添加新的关联关系
  • It is also possible to add a child to an existing element but only if the element is already a parent.
    可以在一个元素已经是父的情况下添加一个子

总结

es中通过父子文档来实现join,但在一个索引中只能有一个一父多子的join

关系字段

es会自动生成一个额外的用于表示关系的字段:field#parent
我们可以通过以下方式查询

1
2
3
4
5
6
7
8
9
10
POST my_index/_search
{
"script_fields": {
"parent": {
"script": {
"source": "doc['my_join_field#question']"
}
}
}
}

部分响应为

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "8",
"_score" : 1.0,
"fields" : {
"parent" : [
"8"
]
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_routing" : "10",
"fields" : {
"parent" : [
"10"
]
}
}

有_routing字段的说明是子文档,它的parent字段是父文档id,如果没有_routing就是父文档,它的parent指向当前id

全局序列

父-子文档的join查询使用一种叫做全局序列(Global ordinals)的技术来加速查询,它采用预加载的方式构建,防止在第一次查询或聚合时出现太长时间的延迟,但在索引元数据改变时重建,父文档越多,构建时间就越长,重建在refresh时进行,这会造成refresh大量延迟时间(在refresh时也是预加载).
如果join字段很少用,可以关闭这种预加载模式:"eager_global_ordinals": false

全局序列的监控

1
2
3
4
# 每个索引
curl -X GET "localhost:9200/_stats/fielddata?human&fields=my_join_field#question&pretty"
# 每个节点上的每个索引
curl -X GET "localhost:9200/_nodes/stats/indices/fielddata?human&fields=my_join_field#question&pretty"

一父多子的祖孙结构

考虑以下结构

1
2
3
4
5
6
7
   question
/ \
/ \
comment answer
|
|
vote

建立索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
PUT my_index
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"],
"answer": "vote"
}
}
}
}
}

插入孙子节点

注意这里的routing和parent值不一样,routing指的是祖父字段,即question,而parent指的就是字面意思answer

1
2
3
4
5
6
7
8
PUT my_index/_doc/3?routing=1&refresh 
{
"text": "This is a vote",
"my_join_field": {
"name": "vote",
"parent": "2"
}
}

has-child查询

查询包含特定子文档的父文档,这是一种很耗性能的查询,尽量少用。它的查询标准格式如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
GET my_index/_search
{
"query": {
"has_child" : {
"type" : "child",
"query" : {
"match_all" : {}
},
"max_children": 10, //可选,符合查询条件的子文档最大返回数
"min_children": 2, //可选,符合查询条件的子文档最小返回数
"score_mode" : "min"
}
}
}

测试代码

部分测试代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
DELETE my_index

PUT /my_index?pretty
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}


# 插入父
PUT /my_index/_doc/8?refresh&pretty
{
"text": "This is a question",
"my_join_field": {
"name": "question"
}
}

PUT /my_index/_doc/10?refresh&pretty
{
"text": "This is a new question",
"my_join_field": {
"name": "question"
}
}

PUT /my_index/_doc/12?refresh&pretty
{
"text": "This is a new question",
"my_join_field": {
"name": "question"
}
}

# 插入子
PUT /my_index/_doc/3?routing=8&refresh&pretty
{
"text": "This is an answer",
"my_join_field": {
"name": "answer",
"parent": "8"
}
}


PUT /my_index/_doc/4?routing=10&refresh&pretty
{
"text": "This is another answer",
"my_join_field": {
"name": "answer",
"parent": "10"
}
}

# 通过parent_id查询子文档
GET my_index/_search
{
"query": {
"parent_id": {
"type": "answer",
"id": "8"
}
}
}

# 查询relation
POST my_index/_search
{
"script_fields": {
"parent": {
"script": {
"source": "doc['my_join_field#question']"
}
}
}
}

Author: 紫夜
Link: https://greedypirate.github.io/2019/09/27/ElasticSearch7-2-父子文档/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
支付宝打赏
微信打赏