Elasticsearch初识、document CRUD、聚合分析

2022-08-12 20:11:08 浏览数 (1)

Elasticsearch是什么

Elasticsearch,分布式,高性能,高可用,可伸缩的搜索引擎和分析系统。

lucene,最先进、功能最强大的搜索库,直接基于lucene开发,非常复杂,api复杂(实现一些简单的功能,写大量的java代码),需要深入理解原理(各种索引结构) lucene,单机应用,只能在单台服务器上使用,最多只能处理单台服务器可以处理的数据量

elasticsearch,基于lucene,隐藏复杂性,提供简单易用的restful api接口、java api接口(还有其他语言的api接口)

Elasticsearch中的概念

索引 相当数据库的库 类型 想当数据库的表 文档 相当数据库的行 索引>类型>文档 近实时 从数据导入es中到能被查询会有1s的延迟;查询速度是秒级别的;

shard 单台服务器的容量是有限的,es通过分片来实现横向扩容,es默认在创建index的时候会设置创建5个primary shard,5个replica shard共10个shard; primary shard,可以将查询和分析分配到不同的机器提高并行能力,提供吞吐量 和replica shard,可以在primary shard故障的时候,提供备份,多个replica还可以提升搜索操作的吞吐量和性能。

primary shard和replica shard不能再同一个节点,所以最小配置是2台机器的集群.

安装Elasticsearch

这里安装windown版的,目的学习,后面再安装liunx的 下载Elasticsearch 下载kibana

下载—》解压—》运行elasticsearch.bat 下载—》解压—》运行kibana.bat

测试elasticsearch启动成功

代码语言:javascript复制
//http://localhost:9200/ 
{ 
"name" : "Hrqe2n8", 
"cluster_name" : "elasticsearch", 
"cluster_uuid" : "1VuvfPBNQFq9DKw-GqCGIQ", 
"version" : { 
"number" : "5.2.0", 
"build_hash" : "24e05b9", 
"build_date" : "2017-01-24T19:52:35.800Z", 
"build_snapshot" : false, 
"lucene_version" : "6.4.0" 
}, 
"tagline" : "You Know, for Search" 
} 
name: node名称 
cluster_name: 集群名称(默认的集群名称就是elasticsearch) 
version.number: 5.2.0,es版本号 

Elasticsearch的基本功能

es提供了一套api,叫做cat api,可以查看es中各种各样的数据 检查集群的健康状态

代码语言:javascript复制
//GET /_cat/health?v 
//?v是要显示表头 
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 
1566183046 10:50:46 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0% 
kibana自己建立的index是1个primary shard和1个replica shard。当前就一个node,所以只有1个primary shard被分配了和启动了,但是一个replica shard没有第二台机器去启动,没有分配,所以50%。 

如何快速了解集群的健康状况?green、yellow、red? green:每个索引的primary shard和replica shard都是active状态的 yellow:每个索引的primary shard都是active状态的,但是部分replica shard不是active状态,处于不可用的状态 red:不是所有索引的primary shard都是active状态的,部分索引有数据丢失了

查看索引

代码语言:javascript复制
//GET _cat/indices?v 
//?v显示表头 
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size 
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb 

创建索引

代码语言:javascript复制
//创建索引 注意名称只能是小写,不然报错 invalid_index_name_exception 
PUT /second_index?pretty 
//查看一下_cat/indices?v 
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size 
yellow open second_index KYTB0egzQ-WN2yIz4WMIBw 5 1 0 0 650b 650b 
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb 
yellow open first_index GLPkvksbSVyhAj1wDh2pcg 5 1 0 0 650b 650b 

删除索引

代码语言:javascript复制
DELETE /second_index?pretty 
//查看一下 
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size 
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb 
yellow open first_index GLPkvksbSVyhAj1wDh2pcg 5 1 0 0 650b 650b 

新增文档

代码语言:javascript复制
//格式: 
PUT /index/type/id 
{json字符串} 
PUT /tea/product/ 
{ 
"name":"wu long cha", 
"price":500.00, 
"desc":"this is good tea" 
} 
//返回 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 1, 
"result": "created", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
}, 
"created": true 
} 
es会自动建立index和type,不需要提前创建 

查询文档

代码语言:javascript复制
格式:GET /index/type/id 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 1, 
"found": true, 
"_source": { 
"name": "wu long cha", 
"price": 500, 
"desc": "this is good tea" 
} 
} 

更新文档

代码语言:javascript复制
//替换方式 --跟新增一样 
PUT /tea/product/1 
{ 
"name":"wu long cha", 
"price":500.00, 
"desc":"this is good tea" 
} 
返回 发现version变了,created变成false 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 2, 
"result": "updated", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
}, 
"created": false 
} 
替换方式有一个不好,即使必须带上所有的field,才能去进行信息的修改 
假设 我把desc去掉 
PUT /tea/product/1 
{ 
"name":"wu long cha", 
"price":500.00 
} 
返回就变了 相当于desc已经不见了 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 4, 
"result": "updated", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
}, 
"created": false 
} 
更新方式 
//将价格修改成1000.01 
POST /tea/product/1/_update 
{ 
"doc":{ 
"price":1000.01 
} 
} 
查看一下 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 5, 
"found": true, 
"_source": { 
"name": "wu long cha", 
"price": 1000.01 
} 
} 
//将desc加上 
POST /tea/product/1/_update 
{ 
"doc":{ 
"desc":"this is a very good tea" 
} 
} 
查看一下 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 6, 
"found": true, 
"_source": { 
"name": "wu long cha", 
"price": 1000.01, 
"desc": "this is a very good tea" 
} 
} 

删除文档

代码语言:javascript复制
DELETE /tea/product/ 
返回 
{ 
"found": true, 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_version": 7, 
"result": "deleted", 
"_shards": { 
"total": 2, 
"successful": 1, 
"failed": 0 
} 
} 
查看一下 
GET /tea/product/1 
返回 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"found": false 
} 

Elasticsearch核心查询

query string search

代码语言:javascript复制
//查询所有商品 
GET /tea/product/_search 
返回 
took:耗费了几毫秒 
timed_out:是否超时,这里是没有 
_shards:数据拆成了5个分片,所以对于搜索请求,会打到所有的primary shard(或者是它的某个replica shard也可以) 
hits.total:查询结果的数量,3个document 
hits.max_score:score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高 
hits.hits:包含了匹配搜索的document的详细数据 
{ 
"took": 11, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 3, 
"max_score": 1, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": 1, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea" 
} 
}, 
{...}, 
{...} 
] 
} 
} 
//查询名称带有red的茶,并按价格排序倒序 
GET /tea/product/_search?q=name:red&sort=price:desc 
{ 
"took": 38, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 1, 
"max_score": null, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": null, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea" 
}, 
"sort": [ 
2000 
] 
} 
] 
} 
} 
//这个方式参数是接在url后面的,在一些复杂的查询是很难构造的,所以在生产环境中很少用,, 
//一般在命令行临时查询,比如curl,快速的发出请求,来检索想要的信息 

query DSL DSL:Domain Specified Language,特定领域的语言 http request body:请求体,可以用json的格式来构建查询语法,比较方便,可以构建各种复杂的语法,比query string search肯定强大多了

代码语言:javascript复制
//查询所有的茶 
GET /tea/product/_search 
{ 
"query":{ 
"match_all": {} 
} 
} 
//查询带有red的茶,并按价格降序排序 
GET /tea/product/_search 
{ 
"query": { 
"match": { 
"name": "red" 
} 
}, 
"sort":[ 
{"price":"desc"} 
] 
} 
//分页 
GET /tea/product/_search 
{ 
"query": { 
"match_all": {} 
} 
, "from": 0, 
"size": 1 
} 
//form条目的偏移量从0开始的 每页1条 查询前1条 
//form 1 size 2,从第2条开始,每页2条 
//只查询部分字段,相当于mysql select name,price from xxx 
GET /tea/product/_search 
{ 
"query": { 
"match_all": {} 
}, 
"_source": ["name","price"] 
} 

query filter

代码语言:javascript复制
//查询 name包含tea的,并且价格大于等于2000,小于3000的茶 
GET /tea/product/_search 
{ 
"query": { 
"bool": { 
"must": [{ 
"match": { 
"name": "tea" 
} 
}], 
"filter": { 
"range": { 
"price": { 
"gte": 2000, 
"lte": 3000 
} 
} 
} 
} 
} 
} 

full-text search (全文检索)

代码语言:javascript复制
GET /tea/product/_search 
{ 
"query": { 
"match": { 
"desc": "good tea" 
} 
} 
} 
desc这个字段会被拆开,建立倒排索引 
查询语句(desc)会被拆开成 good和tea两个关键字 
//注意下面red tea和pu er的分数 
返回 
{ 
"took": 8, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0.87546873, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": 0.87546873, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea" 
} 
}, 
{...}, 
{...}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "4", 
"_score": 0.25069216, 
"_source": { 
"name": "pu er", 
"price": 10000, 
"desc": "this is good good" 
} 
} 
] 
} 
} 

phrase search

代码语言:javascript复制
//短语查询,要完全包含 good tea,并不能分开。。。 
//下面查询返回结果不会包含puer 
GET /tea/product/_search 
{ 
"query": { 
"match_phrase": { 
"desc": "good tea" 
} 
} 
} 

highlight search

代码语言:javascript复制
//高亮 
GET /tea/product/_search 
{ 
"query": { 
"match": { 
"name": "tea" 
} 
}, 
"highlight": { 
"fields": { 
"name": {} 
} 
} 
} 

聚合分析

测试数据

代码语言:javascript复制
GET /tea/product/_search 
返回 
{ 
"took": 1, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 1, 
"hits": [ 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "2", 
"_score": 1, 
"_source": { 
"name": "red tea", 
"price": 2000, 
"desc": "this is good tea", 
"tags": [ 
"yangsheng" 
] 
} 
}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "4", 
"_score": 1, 
"_source": { 
"name": "pu er", 
"price": 10000, 
"desc": "this is good good", 
"tags": [ 
"meiyan yangsheng" 
] 
} 
}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "1", 
"_score": 1, 
"_source": { 
"name": "wu long tea", 
"price": 1000, 
"desc": "this is good tea", 
"tags": [ 
"yangsheng", 
"meiyan" 
] 
} 
}, 
{ 
"_index": "tea", 
"_type": "product", 
"_id": "3", 
"_score": 1, 
"_source": { 
"name": "green tea", 
"price": 4000, 
"desc": "this is good tea", 
"tags": [ 
"meiyan" 
] 
} 
} 
] 
} 
} 

统计每种tag下各有几个商品

代码语言:javascript复制
GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
} 
} 
} 
} 
报错 
"type": "illegal_argument_exception", 
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [tags] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory." 
处理 
对tags修改type和建立uninverting the inverted index 
PUT tea/_mapping/product 
{ 
"properties": { 
"tags":{ 
"type": "text", 
"fielddata": true 
} 
} 
} 
返回 
{ 
"took": 26, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 3 
}, 
{ 
"key": "yangsheng", 
"doc_count": 3 
} 
] 
} 
} 
} 

统计名称含有pu er的茶,tag下各有几个商品

代码语言:javascript复制
GET /tea/product/_search 
{ 
"size": 0, 
"query": { 
"match": { 
"name": "pu er" 
} 
} 
, "aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
} 
} 
} 
} 
返回 
{ 
"took": 8, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 1, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 1 
}, 
{ 
"key": "yangsheng", 
"doc_count": 1 
} 
] 
} 
} 
} 

计算 每种tag下的茶,的平均价格

代码语言:javascript复制
GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
}, 
"aggs":{ 
"avg_price":{ 
"avg":{ 
"field":"price" 
} 
} 
} 
} 
} 
} 
返回 
{ 
"took": 21, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 3, 
"avg_price": { 
"value": 5000 
} 
}, 
{ 
"key": "yangsheng", 
"doc_count": 3, 
"avg_price": { 
"value": 4333.333333333333 
} 
} 
] 
} 
} 
} 

计算 每种tag下的茶,的平均价格,并按价格排序

代码语言:javascript复制
GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags","order": { 
"avg_price": "asc" 
} 
}, 
"aggs":{ 
"avg_price":{ 
"avg":{ 
"field":"price" 
} 
} 
} 
} 
} 
} 
返回 
{ 
"took": 6, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "yangsheng", 
"doc_count": 3, 
"avg_price": { 
"value": 4333.333333333333 
} 
}, 
{ 
"key": "meiyan", 
"doc_count": 3, 
"avg_price": { 
"value": 5000 
} 
} 
] 
} 
} 
} 

统计每个价格区间的商品数

代码语言:javascript复制
GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_price": { 
"range": { 
"field": "price", 
"ranges": [ 
{ 
"from": 1000, 
"to": 2000 
}, 
{ 
"from": 2000, 
"to": 3000 
}, 
{ 
"from": 3000, 
"to": 4000 
}, 
{ 
"from": 4000, 
"to": 5000 
} 
] 
} 
} 
} 
} 
返回 
//发现 大于等于from 小于to 包头不包尾 
{ 
"took": 1, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_price": { 
"buckets": [ 
{ 
"key": "1000.0-2000.0", 
"from": 1000, 
"to": 2000, 
"doc_count": 1 
}, 
{ 
"key": "2000.0-3000.0", 
"from": 2000, 
"to": 3000, 
"doc_count": 1 
}, 
{ 
"key": "3000.0-4000.0", 
"from": 3000, 
"to": 4000, 
"doc_count": 0 
}, 
{ 
"key": "4000.0-5000.0", 
"from": 4000, 
"to": 5000, 
"doc_count": 1 
} 
] 
} 
} 
} 

按照指定的价格范围区间进行分组,然后在每组内再按照tag进行分组,最后再计算每组的平均价格

代码语言:javascript复制
GET /tea/product/_search 
{ 
"size": 0, 
"aggs": { 
"group_by_price": { 
"range": { 
"field": "price", 
"ranges": [ 
{ 
"from": 1000, 
"to": 2000 
}, 
{ 
"from": 2000, 
"to": 3000 
}, 
{ 
"from": 3000, 
"to": 4000 
}, 
{ 
"from": 4000, 
"to": 5000 
} 
] 
} 
, "aggs": { 
"group_by_tags": { 
"terms": { 
"field": "tags" 
} 
, "aggs": { 
"avg_price": { 
"avg": { 
"field": "price" 
} 
} 
} 
} 
} 
} 
} 
} 
返回 
{ 
"took": 16, 
"timed_out": false, 
"_shards": { 
"total": 5, 
"successful": 5, 
"failed": 0 
}, 
"hits": { 
"total": 4, 
"max_score": 0, 
"hits": [] 
}, 
"aggregations": { 
"group_by_price": { 
"buckets": [ 
{ 
"key": "1000.0-2000.0", 
"from": 1000, 
"to": 2000, 
"doc_count": 1, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 1, 
"avg_price": { 
"value": 1000 
} 
}, 
{ 
"key": "yangsheng", 
"doc_count": 1, 
"avg_price": { 
"value": 1000 
} 
} 
] 
} 
}, 
{ 
"key": "2000.0-3000.0", 
"from": 2000, 
"to": 3000, 
"doc_count": 1, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "yangsheng", 
"doc_count": 1, 
"avg_price": { 
"value": 2000 
} 
} 
] 
} 
}, 
{ 
"key": "3000.0-4000.0", 
"from": 3000, 
"to": 4000, 
"doc_count": 0, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [] 
} 
}, 
{ 
"key": "4000.0-5000.0", 
"from": 4000, 
"to": 5000, 
"doc_count": 1, 
"group_by_tags": { 
"doc_count_error_upper_bound": 0, 
"sum_other_doc_count": 0, 
"buckets": [ 
{ 
"key": "meiyan", 
"doc_count": 1, 
"avg_price": { 
"value": 4000 
} 
} 
] 
} 
} 
] 
} 
} 
} 

查询语法

match和term的区别

text会分词 keyword不会分词

match会分词 term不会分词

match查询text match分词后的某个单词要和text分词后的某个单词完全匹配

term查询text term不会分词,所以查询的关键字要和 es中分词后的某个单词完全匹配

match查询keyword 两者要完全匹配(不管分词不分词) match查询的时候会分词的吗? 是的,但是如果这个字段本身是keyword类型的也就是不分词的情况下,那么match去查询的时候也不会进行分词。

term查询keyword 两者要完全匹配

0 人点赞