Elasticsearch是什么
Elasticsearch,分布式,高性能,高可用,可伸缩的搜索引擎和分析系统。
lucene,最先进、功能最强大的搜索库,直接基于lucene开发,非常复杂,api复杂(实现一些简单的功能,写大量的java代码),需要深入理解原理(各种索引结构) lucene,单机应用,只能在单台服务器上使用,最多只能处理单台服务器可以处理的数据量
elasticsearch,基于lucene,隐藏复杂性,提供简单易用的restful api接口、java api接口(还有其他语言的api接口)
Elasticsearch中的概念
索引 相当数据库的库 类型 想当数据库的表 文档 相当数据库的行 索引>类型>文档 近实时 从数据导入es中到能被查询会有1s的延迟;查询速度是秒级别的;
shard 单台服务器的容量是有限的,es通过分片来实现横向扩容,es默认在创建index的时候会设置创建5个primary shard,5个replica shard共10个shard; primary shard,可以将查询和分析分配到不同的机器提高并行能力,提供吞吐量 和replica shard,可以在primary shard故障的时候,提供备份,多个replica还可以提升搜索操作的吞吐量和性能。
primary shard和replica shard不能再同一个节点,所以最小配置是2台机器的集群.
安装Elasticsearch
这里安装windown版的,目的学习,后面再安装liunx的 下载Elasticsearch 下载kibana
下载—》解压—》运行elasticsearch.bat 下载—》解压—》运行kibana.bat
测试elasticsearch启动成功
代码语言:javascript复制//http://localhost:9200/
{
"name" : "Hrqe2n8",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "1VuvfPBNQFq9DKw-GqCGIQ",
"version" : {
"number" : "5.2.0",
"build_hash" : "24e05b9",
"build_date" : "2017-01-24T19:52:35.800Z",
"build_snapshot" : false,
"lucene_version" : "6.4.0"
},
"tagline" : "You Know, for Search"
}
name: node名称
cluster_name: 集群名称(默认的集群名称就是elasticsearch)
version.number: 5.2.0,es版本号
Elasticsearch的基本功能
es提供了一套api,叫做cat api,可以查看es中各种各样的数据 检查集群的健康状态
代码语言:javascript复制//GET /_cat/health?v
//?v是要显示表头
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1566183046 10:50:46 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0%
kibana自己建立的index是1个primary shard和1个replica shard。当前就一个node,所以只有1个primary shard被分配了和启动了,但是一个replica shard没有第二台机器去启动,没有分配,所以50%。
如何快速了解集群的健康状况?green、yellow、red? green:每个索引的primary shard和replica shard都是active状态的 yellow:每个索引的primary shard都是active状态的,但是部分replica shard不是active状态,处于不可用的状态 red:不是所有索引的primary shard都是active状态的,部分索引有数据丢失了
查看索引
代码语言:javascript复制//GET _cat/indices?v
//?v显示表头
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb
创建索引
代码语言:javascript复制//创建索引 注意名称只能是小写,不然报错 invalid_index_name_exception
PUT /second_index?pretty
//查看一下_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open second_index KYTB0egzQ-WN2yIz4WMIBw 5 1 0 0 650b 650b
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb
yellow open first_index GLPkvksbSVyhAj1wDh2pcg 5 1 0 0 650b 650b
删除索引
代码语言:javascript复制DELETE /second_index?pretty
//查看一下
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana vBdlyz6TR5mFhFUaFmHZrw 1 1 1 0 3.1kb 3.1kb
yellow open first_index GLPkvksbSVyhAj1wDh2pcg 5 1 0 0 650b 650b
新增文档
代码语言:javascript复制//格式:
PUT /index/type/id
{json字符串}
PUT /tea/product/
{
"name":"wu long cha",
"price":500.00,
"desc":"this is good tea"
}
//返回
{
"_index": "tea",
"_type": "product",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}
es会自动建立index和type,不需要提前创建
查询文档
代码语言:javascript复制格式:GET /index/type/id
{
"_index": "tea",
"_type": "product",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"name": "wu long cha",
"price": 500,
"desc": "this is good tea"
}
}
更新文档
代码语言:javascript复制//替换方式 --跟新增一样
PUT /tea/product/1
{
"name":"wu long cha",
"price":500.00,
"desc":"this is good tea"
}
返回 发现version变了,created变成false
{
"_index": "tea",
"_type": "product",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
替换方式有一个不好,即使必须带上所有的field,才能去进行信息的修改
假设 我把desc去掉
PUT /tea/product/1
{
"name":"wu long cha",
"price":500.00
}
返回就变了 相当于desc已经不见了
{
"_index": "tea",
"_type": "product",
"_id": "1",
"_version": 4,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
更新方式
//将价格修改成1000.01
POST /tea/product/1/_update
{
"doc":{
"price":1000.01
}
}
查看一下
{
"_index": "tea",
"_type": "product",
"_id": "1",
"_version": 5,
"found": true,
"_source": {
"name": "wu long cha",
"price": 1000.01
}
}
//将desc加上
POST /tea/product/1/_update
{
"doc":{
"desc":"this is a very good tea"
}
}
查看一下
{
"_index": "tea",
"_type": "product",
"_id": "1",
"_version": 6,
"found": true,
"_source": {
"name": "wu long cha",
"price": 1000.01,
"desc": "this is a very good tea"
}
}
删除文档
代码语言:javascript复制DELETE /tea/product/
返回
{
"found": true,
"_index": "tea",
"_type": "product",
"_id": "1",
"_version": 7,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
}
}
查看一下
GET /tea/product/1
返回
{
"_index": "tea",
"_type": "product",
"_id": "1",
"found": false
}
Elasticsearch核心查询
query string search
代码语言:javascript复制//查询所有商品
GET /tea/product/_search
返回
took:耗费了几毫秒
timed_out:是否超时,这里是没有
_shards:数据拆成了5个分片,所以对于搜索请求,会打到所有的primary shard(或者是它的某个replica shard也可以)
hits.total:查询结果的数量,3个document
hits.max_score:score的含义,就是document对于一个search的相关度的匹配分数,越相关,就越匹配,分数也高
hits.hits:包含了匹配搜索的document的详细数据
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "tea",
"_type": "product",
"_id": "2",
"_score": 1,
"_source": {
"name": "red tea",
"price": 2000,
"desc": "this is good tea"
}
},
{...},
{...}
]
}
}
//查询名称带有red的茶,并按价格排序倒序
GET /tea/product/_search?q=name:red&sort=price:desc
{
"took": 38,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "tea",
"_type": "product",
"_id": "2",
"_score": null,
"_source": {
"name": "red tea",
"price": 2000,
"desc": "this is good tea"
},
"sort": [
2000
]
}
]
}
}
//这个方式参数是接在url后面的,在一些复杂的查询是很难构造的,所以在生产环境中很少用,,
//一般在命令行临时查询,比如curl,快速的发出请求,来检索想要的信息
query DSL DSL:Domain Specified Language,特定领域的语言 http request body:请求体,可以用json的格式来构建查询语法,比较方便,可以构建各种复杂的语法,比query string search肯定强大多了
代码语言:javascript复制//查询所有的茶
GET /tea/product/_search
{
"query":{
"match_all": {}
}
}
//查询带有red的茶,并按价格降序排序
GET /tea/product/_search
{
"query": {
"match": {
"name": "red"
}
},
"sort":[
{"price":"desc"}
]
}
//分页
GET /tea/product/_search
{
"query": {
"match_all": {}
}
, "from": 0,
"size": 1
}
//form条目的偏移量从0开始的 每页1条 查询前1条
//form 1 size 2,从第2条开始,每页2条
//只查询部分字段,相当于mysql select name,price from xxx
GET /tea/product/_search
{
"query": {
"match_all": {}
},
"_source": ["name","price"]
}
query filter
代码语言:javascript复制//查询 name包含tea的,并且价格大于等于2000,小于3000的茶
GET /tea/product/_search
{
"query": {
"bool": {
"must": [{
"match": {
"name": "tea"
}
}],
"filter": {
"range": {
"price": {
"gte": 2000,
"lte": 3000
}
}
}
}
}
}
full-text search (全文检索)
代码语言:javascript复制GET /tea/product/_search
{
"query": {
"match": {
"desc": "good tea"
}
}
}
desc这个字段会被拆开,建立倒排索引
查询语句(desc)会被拆开成 good和tea两个关键字
//注意下面red tea和pu er的分数
返回
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.87546873,
"hits": [
{
"_index": "tea",
"_type": "product",
"_id": "2",
"_score": 0.87546873,
"_source": {
"name": "red tea",
"price": 2000,
"desc": "this is good tea"
}
},
{...},
{...},
{
"_index": "tea",
"_type": "product",
"_id": "4",
"_score": 0.25069216,
"_source": {
"name": "pu er",
"price": 10000,
"desc": "this is good good"
}
}
]
}
}
phrase search
代码语言:javascript复制//短语查询,要完全包含 good tea,并不能分开。。。
//下面查询返回结果不会包含puer
GET /tea/product/_search
{
"query": {
"match_phrase": {
"desc": "good tea"
}
}
}
highlight search
代码语言:javascript复制//高亮
GET /tea/product/_search
{
"query": {
"match": {
"name": "tea"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
聚合分析
测试数据
代码语言:javascript复制GET /tea/product/_search
返回
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "tea",
"_type": "product",
"_id": "2",
"_score": 1,
"_source": {
"name": "red tea",
"price": 2000,
"desc": "this is good tea",
"tags": [
"yangsheng"
]
}
},
{
"_index": "tea",
"_type": "product",
"_id": "4",
"_score": 1,
"_source": {
"name": "pu er",
"price": 10000,
"desc": "this is good good",
"tags": [
"meiyan yangsheng"
]
}
},
{
"_index": "tea",
"_type": "product",
"_id": "1",
"_score": 1,
"_source": {
"name": "wu long tea",
"price": 1000,
"desc": "this is good tea",
"tags": [
"yangsheng",
"meiyan"
]
}
},
{
"_index": "tea",
"_type": "product",
"_id": "3",
"_score": 1,
"_source": {
"name": "green tea",
"price": 4000,
"desc": "this is good tea",
"tags": [
"meiyan"
]
}
}
]
}
}
统计每种tag下各有几个商品
代码语言:javascript复制GET /tea/product/_search
{
"size": 0,
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
}
}
}
}
报错
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [tags] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
处理
对tags修改type和建立uninverting the inverted index
PUT tea/_mapping/product
{
"properties": {
"tags":{
"type": "text",
"fielddata": true
}
}
}
返回
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "meiyan",
"doc_count": 3
},
{
"key": "yangsheng",
"doc_count": 3
}
]
}
}
}
统计名称含有pu er的茶,tag下各有几个商品
代码语言:javascript复制GET /tea/product/_search
{
"size": 0,
"query": {
"match": {
"name": "pu er"
}
}
, "aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
}
}
}
}
返回
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "meiyan",
"doc_count": 1
},
{
"key": "yangsheng",
"doc_count": 1
}
]
}
}
}
计算 每种tag下的茶,的平均价格
代码语言:javascript复制GET /tea/product/_search
{
"size": 0,
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"
}
}
}
}
}
}
返回
{
"took": 21,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "meiyan",
"doc_count": 3,
"avg_price": {
"value": 5000
}
},
{
"key": "yangsheng",
"doc_count": 3,
"avg_price": {
"value": 4333.333333333333
}
}
]
}
}
}
计算 每种tag下的茶,的平均价格,并按价格排序
代码语言:javascript复制GET /tea/product/_search
{
"size": 0,
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags","order": {
"avg_price": "asc"
}
},
"aggs":{
"avg_price":{
"avg":{
"field":"price"
}
}
}
}
}
}
返回
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "yangsheng",
"doc_count": 3,
"avg_price": {
"value": 4333.333333333333
}
},
{
"key": "meiyan",
"doc_count": 3,
"avg_price": {
"value": 5000
}
}
]
}
}
}
统计每个价格区间的商品数
代码语言:javascript复制GET /tea/product/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 1000,
"to": 2000
},
{
"from": 2000,
"to": 3000
},
{
"from": 3000,
"to": 4000
},
{
"from": 4000,
"to": 5000
}
]
}
}
}
}
返回
//发现 大于等于from 小于to 包头不包尾
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_price": {
"buckets": [
{
"key": "1000.0-2000.0",
"from": 1000,
"to": 2000,
"doc_count": 1
},
{
"key": "2000.0-3000.0",
"from": 2000,
"to": 3000,
"doc_count": 1
},
{
"key": "3000.0-4000.0",
"from": 3000,
"to": 4000,
"doc_count": 0
},
{
"key": "4000.0-5000.0",
"from": 4000,
"to": 5000,
"doc_count": 1
}
]
}
}
}
按照指定的价格范围区间进行分组,然后在每组内再按照tag进行分组,最后再计算每组的平均价格
代码语言:javascript复制GET /tea/product/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 1000,
"to": 2000
},
{
"from": 2000,
"to": 3000
},
{
"from": 3000,
"to": 4000
},
{
"from": 4000,
"to": 5000
}
]
}
, "aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
}
, "aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
返回
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"group_by_price": {
"buckets": [
{
"key": "1000.0-2000.0",
"from": 1000,
"to": 2000,
"doc_count": 1,
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "meiyan",
"doc_count": 1,
"avg_price": {
"value": 1000
}
},
{
"key": "yangsheng",
"doc_count": 1,
"avg_price": {
"value": 1000
}
}
]
}
},
{
"key": "2000.0-3000.0",
"from": 2000,
"to": 3000,
"doc_count": 1,
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "yangsheng",
"doc_count": 1,
"avg_price": {
"value": 2000
}
}
]
}
},
{
"key": "3000.0-4000.0",
"from": 3000,
"to": 4000,
"doc_count": 0,
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "4000.0-5000.0",
"from": 4000,
"to": 5000,
"doc_count": 1,
"group_by_tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "meiyan",
"doc_count": 1,
"avg_price": {
"value": 4000
}
}
]
}
}
]
}
}
}
查询语法
match和term的区别
text会分词 keyword不会分词
match会分词 term不会分词
match查询text match分词后的某个单词要和text分词后的某个单词完全匹配
term查询text term不会分词,所以查询的关键字要和 es中分词后的某个单词完全匹配
match查询keyword 两者要完全匹配(不管分词不分词) match查询的时候会分词的吗? 是的,但是如果这个字段本身是keyword类型的也就是不分词的情况下,那么match去查询的时候也不会进行分词。
term查询keyword 两者要完全匹配