简介
Head 插件索引文档数显示结果不一致
索引中大量文档状态是 deleted
代码语言:javascript复制{
"_shards":{
"total":12,
"successful":12,
"failed":0
},
"_all":{
"primaries":{
"docs":{
"count":94830,
"deleted":13143
},
"store":{
"size_in_bytes":1838486334,
"throttle_time_in_millis":0
},
"refresh":{
"total":8678,
"total_time_in_millis":54815
},
"flush":{
"total":1831,
"total_time_in_millis":103795
},
"query_cache":{
"memory_size_in_bytes":428848,
"total_count":770522,
"hit_count":271827,
"miss_count":498695,
"cache_size":211,
"cache_count":279,
"evictions":68
},
"segments":{
"count":96,
"memory_in_bytes":2944151,
"terms_memory_in_bytes":2510183,
"stored_fields_memory_in_bytes":118080,
"term_vectors_memory_in_bytes":194392,
"norms_memory_in_bytes":62912,
"doc_values_memory_in_bytes":58584,
"index_writer_memory_in_bytes":0,
"index_writer_max_memory_in_bytes":3072000,
"version_map_memory_in_bytes":0,
"fixed_bit_set_memory_in_bytes":0
},
"translog":{
"operations":1,
"size_in_bytes":303
},
"recovery":{
"current_as_source":0,
"current_as_target":0,
"throttle_time_in_millis":269174
}
}
}
}
Deleted 本质
- _id 代表的唯一 id
- _version 代表的文档的版本号
PUT test/_doc/1
{
"counter" : 1,
"province" : "北京"
}
结果:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 10,
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 1, # 版本号
"_score" : 1.0,
"_source" : {
"counter" : 1,
"province" : "北京"
}
}
]
}
}
此时查看文档的version都是1,假如我们再次执行一下更新id为1的文档:
代码语言:javascript复制PUT test/_doc/1
{
"counter" : 1,
"province" : "北京"
}
结果:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 2, # 版本号变为了2
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
代码语言:javascript复制GET test/_stats
# "count" : 0, "deleted" : 1
假如执行 delete 操作后,看下 version 结果:
代码语言:javascript复制DELETE test/_doc/1
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1
}
# "count" : 0, "deleted" : 2
由此,初步得出结论:更新、删除操作实际是在原来文档的基础上版本号 1,且每执行一次,版本号 1 一次。同时,原来的老版本的文档标记为deleted 状态。
文档删除本质
删除文档本质:逻辑删除而非物理删除。在执行删除文档后,待删除文档不会立即将文档从磁盘中删除,而是将文档标记为已删除状态(版本号 _version 1, "result" 标记为:"deleted",)。最直观的反应就是被经常问到的问题“怎么删除文档后,磁盘空间不降?”随着不断的索引更多的数据,Elasticsearch 将会在后台清理标记为已删除的文档。
如果想要从磁盘上删除,需要借助段合并来实现,具体实践参考:
代码语言:javascript复制POST test/_forcemerge?only_expunge_deletes
段合并中参数:“only_expunge_deletes“ 的含义只清除已标记为 deleted 的文档。
文档更新本质
更新文档的本质:delete add。
代码语言:javascript复制In Lucene, the core engine of Elasticsearch, inserting or updating a document has thesame cost: in Lucene and Elasticsearch, to update means to replace.
表面上是更新,实际上是:Elasticsearch 将旧文档标记为已删除(deleted),并增加(add)一个全新的文档。同删除文档一样,旧文档不能被访问,但,旧文档不会被立即物理删除,除非手动或者定时执行了段合并操作。
索引删除本质
索引删除本质:物理删除数据。不同于删除文档,删除索引意味着删除其分片、映射和数据。索引删除会更直接、快速、暴力。删除索引后,与索引有关的所有数据将从直接从磁盘中删除。
索引删除包含两个步骤:
- 更新集群
- 分片从磁盘删除
删除索引操作:
DELETE test