Elasticsearch初识、document CRUD、聚合分析

Elasticsearch是什么

Elasticsearch，分布式，高性能，高可用，可伸缩的搜索引擎和分析系统。

lucene，最先进、功能最强大的搜索库，直接基于lucene开发，非常复杂，api复杂（实现一些简单的功能，写大量的java代码），需要深入理解原理（各种索引结构） lucene，单机应用，只能在单台服务器上使用，最多只能处理单台服务器可以处理的数据量

elasticsearch，基于lucene，隐藏复杂性，提供简单易用的restful api接口、java api接口（还有其他语言的api接口）

Elasticsearch中的概念

索引相当数据库的库类型想当数据库的表文档相当数据库的行索引>类型>文档 近实时 从数据导入es中到能被查询会有1s的延迟；查询速度是秒级别的；

shard 单台服务器的容量是有限的，es通过分片来实现横向扩容，es默认在创建index的时候会设置创建5个primary shard，5个replica shard共10个shard； primary shard，可以将查询和分析分配到不同的机器提高并行能力，提供吞吐量 和replica shard,可以在primary shard故障的时候，提供备份，多个replica还可以提升搜索操作的吞吐量和性能。

primary shard和replica shard不能再同一个节点，所以最小配置是2台机器的集群.

安装Elasticsearch

这里安装windown版的，目的学习，后面再安装liunx的下载Elasticsearch 下载kibana

下载—》解压—》运行elasticsearch.bat 下载—》解压—》运行kibana.bat

测试elasticsearch启动成功

代码语言：javascript复制

//http://localhost:9200/ 
{ 
"name" : "Hrqe2n8", 
"cluster_name" : "elasticsearch", 
"cluster_uuid" : "1VuvfPBNQFq9DKw-GqCGIQ", 
"version" : { 
"number" : "5.2.0", 
"build_hash" : "24e05b9", 
"build_date" : "2017-01-24T19:52:35.800Z", 
"build_snapshot" : false, 
"lucene_version" : "6.4.0" 
}, 
"tagline" : "You Know, for Search" 
} 
name: node名称 
cluster_name: 集群名称（默认的集群名称就是elasticsearch） 
version.number: 5.2.0，es版本号

Elasticsearch的基本功能

es提供了一套api，叫做cat api，可以查看es中各种各样的数据 检查集群的健康状态

代码语言：javascript复制

//GET /_cat/health?v 
//?v是要显示表头 
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 
1566183046 10:50:46 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0% 
kibana自己建立的index是1个primary shard和1个replica shard。当前就一个node，所以只有1个primary shard被分配了和启动了，但是一个replica shard没有第二台机器去启动，没有分配，所以50%。

如何快速了解集群的健康状况？green、yellow、red？ green：每个索引的primary shard和replica shard都是active状态的 yellow：每个索引的primary shard都是active状态的，但是部分replica shard不是active状态，处于不可用的状态 red：不是所有索引的primary shard都是active状态的，部分索引有数据丢失了

查看索引

代码语言：javascript复制

//GET _cat/indices?v 
//?v显示表头 
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size 
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb

创建索引

代码语言：javascript复制

//创建索引 注意名称只能是小写，不然报错 invalid_index_name_exception 
PUT /second_index?pretty 
//查看一下_cat/indices?v 
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size 
yellow open second_index KYTB0egzQ-WN2yIz4WMIBw 5 1 0 0 650b 650b 
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb 
yellow open first_index GLPkvksbSVyhAj1wDh2pcg 5 1 0 0 650b 650b

删除索引

代码语言：javascript复制

DELETE /second_index?pretty 
//查看一下 
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size 
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb 
yellow open first_index GLPkvksbSVyhAj1wDh2pcg 5 1 0 0 650b 650b

新增文档

代码语言：javascript复制

//格式： 
PUT /index/type/id 
{json字符串} 
PUT /tea/product/ 
{ 
"name":"wu long cha", 
"price":500.00, 
"desc":"this is good tea" 
} 
//返回 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 1, 
"result": "created", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
}, 
"created": true 
} 
es会自动建立index和type，不需要提前创建

查询文档

代码语言：javascript复制

格式：GET /index/type/id 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 1, 
"found": true, 
"_source": { 
"name": "wu long cha", 
"price": 500, 
"desc": "this is good tea" 
} 
}

更新文档

代码语言：javascript复制

//替换方式 --跟新增一样 
PUT /tea/product/1 
{ 
"name":"wu long cha", 
"price":500.00, 
"desc":"this is good tea" 
} 
返回 发现version变了，created变成false 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 2, 
"result": "updated", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
}, 
"created": false 
} 
替换方式有一个不好，即使必须带上所有的field，才能去进行信息的修改 
假设 我把desc去掉 
PUT /tea/product/1 
{ 
"name":"wu long cha", 
"price":500.00 
} 
返回就变了 相当于desc已经不见了 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 4, 
"result": "updated", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
}, 
"created": false 
} 
更新方式 
//将价格修改成1000.01 
POST /tea/product/1/_update 
{ 
"doc":{ 
"price":1000.01 
} 
} 
查看一下 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 5, 
"found": true, 
"_source": { 
"name": "wu long cha", 
"price": 1000.01 
} 
} 
//将desc加上 
POST /tea/product/1/_update 
{ 
"doc":{ 
"desc":"this is a very good tea" 
} 
} 
查看一下 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 6, 
"found": true, 
"_source": { 
"name": "wu long cha", 
"price": 1000.01, 
"desc": "this is a very good tea" 
} 
}

删除文档

代码语言：javascript复制

DELETE /tea/product/ 
返回 
{ 
"found": true, 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 7, 
"result": "deleted", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
} 
} 
查看一下 
GET /tea/product/1 
返回 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"found": false 
}

Elasticsearch核心查询

query string search

代码语言：javascript复制

//查询所有商品 
GET /tea/product/_search 
返回 
took：耗费了几毫秒 
timed_out：是否超时，这里是没有 
_shards：数据拆成了5个分片，所以对于搜索请求，会打到所有的primary shard（或者是它的某个replica shard也可以） 
hits.total：查询结果的数量，3个document 
hits.max_score：score的含义，就是document对于一个search的相关度的匹配分数，越相关，就越匹配，分数也高 
hits.hits：包含了匹配搜索的document的详细数据 
{ 
"took": 11, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 3, 
"max_score": 1, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": 1, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea" 
} 
}, 
{...}, 
{...} 
] 
} 
} 
//查询名称带有red的茶，并按价格排序倒序 
GET /tea/product/_search?q=name:red&sort=price:desc 
{ 
"took": 38, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 1, 
"max_score": null, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": null, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea" 
}, 
"sort": [ 
2000 
] 
} 
] 
} 
} 
//这个方式参数是接在url后面的，在一些复杂的查询是很难构造的，所以在生产环境中很少用，， 
//一般在命令行临时查询，比如curl，快速的发出请求，来检索想要的信息

query DSL DSL：Domain Specified Language，特定领域的语言 http request body：请求体，可以用json的格式来构建查询语法，比较方便，可以构建各种复杂的语法，比query string search肯定强大多了

代码语言：javascript复制

//查询所有的茶 
GET /tea/product/_search 
{ 
"query":{ 
"match_all": {} 
} 
} 
//查询带有red的茶，并按价格降序排序 
GET /tea/product/_search 
{ 
"query": { 
"match": { 
"name": "red" 
} 
}, 
"sort":[ 
{"price":"desc"} 
] 
} 
//分页 
GET /tea/product/_search 
{ 
"query": { 
"match_all": {} 
} 
, "from": 0, 
"size": 1 
} 
//form条目的偏移量从0开始的 每页1条 查询前1条 
//form 1 size 2，从第2条开始，每页2条 
//只查询部分字段，相当于mysql select name，price from xxx 
GET /tea/product/_search 
{ 
"query": { 
"match_all": {} 
}, 
"_source": ["name","price"] 
}

query filter

代码语言：javascript复制

//查询 name包含tea的，并且价格大于等于2000，小于3000的茶 
GET /tea/product/_search 
{ 
"query": { 
"bool": { 
"must": [{ 
"match": { 
"name": "tea" 
} 
}], 
"filter": { 
"range": { 
"price": { 
"gte": 2000, 
"lte": 3000 
} 
} 
} 
} 
} 
}

full-text search （全文检索）

代码语言：javascript复制

GET /tea/product/_search 
{ 
"query": { 
"match": { 
"desc": "good tea" 
} 
} 
} 
desc这个字段会被拆开，建立倒排索引 
查询语句（desc）会被拆开成 good和tea两个关键字 
//注意下面red tea和pu er的分数 
返回 
{ 
"took": 8, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0.87546873, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": 0.87546873, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea" 
} 
}, 
{...}, 
{...}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "4", 
"_score": 0.25069216, 
"_source": { 
"name": "pu er", 
"price": 10000, 
"desc": "this is good good" 
} 
} 
] 
} 
}

phrase search

代码语言：javascript复制

//短语查询，要完全包含 good tea，并不能分开。。。 
//下面查询返回结果不会包含puer 
GET /tea/product/_search 
{ 
"query": { 
"match_phrase": { 
"desc": "good tea" 
} 
} 
}

highlight search

代码语言：javascript复制

//高亮 
GET /tea/product/_search 
{ 
"query": { 
"match": { 
"name": "tea" 
} 
}, 
"highlight": { 
"fields": { 
"name": {} 
} 
} 
}

聚合分析

测试数据

代码语言：javascript复制

GET /tea/product/_search 
返回 
{ 
"took": 1, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 1, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": 1, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea", 
"tags": [ 
"yangsheng" 
] 
} 
}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "4", 
"_score": 1, 
"_source": { 
"name": "pu er", 
"price": 10000, 
"desc": "this is good good", 
"tags": [ 
"meiyan yangsheng" 
] 
} 
}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_score": 1, 
"_source": { 
"name": "wu long tea", 
"price": 1000, 
"desc": "this is good tea", 
"tags": [ 
"yangsheng", 
"meiyan" 
] 
} 
}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "3", 
"_score": 1, 
"_source": { 
"name": "green tea", 
"price": 4000, 
"desc": "this is good tea", 
"tags": [ 
"meiyan" 
] 
} 
} 
] 
} 
}

统计每种tag下各有几个商品

代码语言：javascript复制

GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
} 
} 
} 
} 
报错 
"type": "illegal_argument_exception", 
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [tags] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
处理 
对tags修改type和建立uninverting the inverted index 
PUT tea/_mapping/product 
{ 
"properties": { 
"tags":{ 
"type": "text", 
"fielddata": true 
} 
} 
} 
返回 
{ 
"took": 26, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 3 
}, 
{ 
"key": "yangsheng", 
"doc_count": 3 
} 
] 
} 
} 
}

统计名称含有pu er的茶，tag下各有几个商品

代码语言：javascript复制

GET /tea/product/_search 
{ 
"size": 0, 
"query": { 
"match": { 
"name": "pu er" 
} 
} 
, "aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
} 
} 
} 
} 
返回 
{ 
"took": 8, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 1, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 1 
}, 
{ 
"key": "yangsheng", 
"doc_count": 1 
} 
] 
} 
} 
}

计算每种tag下的茶，的平均价格

代码语言：javascript复制

GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
}, 
"aggs":{ 
"avg_price":{ 
"avg":{ 
"field":"price" 
} 
} 
} 
} 
} 
} 
返回 
{ 
"took": 21, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 3, 
"avg_price": { 
"value": 5000 
} 
}, 
{ 
"key": "yangsheng", 
"doc_count": 3, 
"avg_price": { 
"value": 4333.333333333333 
} 
} 
] 
} 
} 
}

计算每种tag下的茶，的平均价格，并按价格排序

代码语言：javascript复制

GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags","order": { 
"avg_price": "asc" 
} 
}, 
"aggs":{ 
"avg_price":{ 
"avg":{ 
"field":"price" 
} 
} 
} 
} 
} 
} 
返回 
{ 
"took": 6, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "yangsheng", 
"doc_count": 3, 
"avg_price": { 
"value": 4333.333333333333 
} 
}, 
{ 
"key": "meiyan", 
"doc_count": 3, 
"avg_price": { 
"value": 5000 
} 
} 
] 
} 
} 
}

统计每个价格区间的商品数

代码语言：javascript复制

GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_price": { 
"range": { 
"field": "price", 
"ranges": [ 
{ 
"from": 1000, 
"to": 2000 
}, 
{ 
"from": 2000, 
"to": 3000 
}, 
{ 
"from": 3000, 
"to": 4000 
}, 
{ 
"from": 4000, 
"to": 5000 
} 
] 
} 
} 
} 
} 
返回 
//发现 大于等于from 小于to 包头不包尾 
{ 
"took": 1, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_price": { 
"buckets": [ 
{ 
"key": "1000.0-2000.0", 
"from": 1000, 
"to": 2000, 
"doc_count": 1 
}, 
{ 
"key": "2000.0-3000.0", 
"from": 2000, 
"to": 3000, 
"doc_count": 1 
}, 
{ 
"key": "3000.0-4000.0", 
"from": 3000, 
"to": 4000, 
"doc_count": 0 
}, 
{ 
"key": "4000.0-5000.0", 
"from": 4000, 
"to": 5000, 
"doc_count": 1 
} 
] 
} 
} 
}

按照指定的价格范围区间进行分组，然后在每组内再按照tag进行分组，最后再计算每组的平均价格

代码语言：javascript复制

GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_price": { 
"range": { 
"field": "price", 
"ranges": [ 
{ 
"from": 1000, 
"to": 2000 
}, 
{ 
"from": 2000, 
"to": 3000 
}, 
{ 
"from": 3000, 
"to": 4000 
}, 
{ 
"from": 4000, 
"to": 5000 
} 
] 
} 
, "aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
} 
, "aggs": { 
"avg_price": { 
"avg": { 
"field": "price" 
} 
} 
} 
} 
} 
} 
} 
} 
返回 
{ 
"took": 16, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_price": { 
"buckets": [ 
{ 
"key": "1000.0-2000.0", 
"from": 1000, 
"to": 2000, 
"doc_count": 1, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 1, 
"avg_price": { 
"value": 1000 
} 
}, 
{ 
"key": "yangsheng", 
"doc_count": 1, 
"avg_price": { 
"value": 1000 
} 
} 
] 
} 
}, 
{ 
"key": "2000.0-3000.0", 
"from": 2000, 
"to": 3000, 
"doc_count": 1, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "yangsheng", 
"doc_count": 1, 
"avg_price": { 
"value": 2000 
} 
} 
] 
} 
}, 
{ 
"key": "3000.0-4000.0", 
"from": 3000, 
"to": 4000, 
"doc_count": 0, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [] 
} 
}, 
{ 
"key": "4000.0-5000.0", 
"from": 4000, 
"to": 5000, 
"doc_count": 1, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 1, 
"avg_price": { 
"value": 4000 
} 
} 
] 
} 
} 
] 
} 
} 
}

查询语法

match和term的区别

text会分词 keyword不会分词

match会分词 term不会分词

match查询text match分词后的某个单词要和text分词后的某个单词完全匹配

term查询text term不会分词，所以查询的关键字要和 es中分词后的某个单词完全匹配

match查询keyword 两者要完全匹配（不管分词不分词） match查询的时候会分词的吗？是的，但是如果这个字段本身是keyword类型的也就是不分词的情况下，那么match去查询的时候也不会进行分词。

term查询keyword 两者要完全匹配

es api ElasticsearchService lucene/solr 数据库

0 人点赞