概述
继续跟中华石杉老师学习ES,第56篇
课程地址: https://www.roncoo.com/view/55
官网
简言之,就是对类似文件系统这种的有多层级关系的数据进行分词
Path Hierarchy Tokenizer:戳这里
Path Hierarchy Tokenizer Examples:戳这里
示例
模拟:文件系统数据构造
代码语言:javascript复制PUT /filesystem
{
"settings": {
"analysis": {
"analyzer": {
"paths": {
"tokenizer": "path_hierarchy"
}
}
}
}
}
测试path_hierarchy分词
代码语言:javascript复制POST filesystem/_analyze
{
"tokenizer": "path_hierarchy",
"text": "/home/elasticsearch/image"
}
返回:
代码语言:javascript复制{
"tokens": [
{
"token": "/home",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "/home/elasticsearch",
"start_offset": 0,
"end_offset": 19,
"type": "word",
"position": 0
},
{
"token": "/home/elasticsearch/image",
"start_offset": 0,
"end_offset": 25,
"type": "word",
"position": 0
}
]
}
path_hierarchy tokenizer: 会把/a/b/c/d
路径通过path_hierarchy 分词为 /a/b/c/d, /a/b/c, /a/b, /a
需求一: 查找一份,内容包括ES,在/workspace/workspace/projects/helloworld这个目录下的文件
手动指定字段类型,并模拟个数据到索引
代码语言:javascript复制#指定字段类型
PUT /filesystem/_mapping/file
{
"properties": {
"name": {
"type": "keyword"
},
"path": {
"type": "keyword",
"fields": {
"tree": {
"type": "text",
"analyzer": "paths"
}
}
}
}
}
#查看映射
GET /filesystem/_mapping
#写入数据
PUT /filesystem/file/1
{
"name": "README.txt",
"path": "/workspace/projects/helloworld",
"contents": "小工匠跟石杉老师学习ES"
}
需求DSL:
代码语言:javascript复制#文件搜索需求:查找一份,内容包括ES,在/workspace/workspace/projects/helloworld这个目录下的文件
GET /filesystem/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"contents": "ES"
}
}
],
"filter": {
"term": {
"path": "/workspace/projects/helloworld"
}
}
}
}
}
返回:
需求二: 搜索/workspace目录下,内容包含ES的所有的文件
再写几条数据进去
代码语言:javascript复制PUT /filesystem/file/2
{
"name": "README.txt",
"path": "/workspace/projects",
"contents": "小工匠跟石杉老师学习ES"
}
PUT /filesystem/file/3
{
"name": "README.txt",
"path": "/workspace/xxxxx",
"contents": "小工匠跟石杉老师学习ES"
}
PUT /filesystem/file/4
{
"name": "README.txt",
"path": "/home/artisan",
"contents": "小工匠跟石杉老师学习ES"
}
PUT /filesystem/file/5
{
"name": "README.txt",
"path": "/workspace",
"contents": "小工匠跟石杉老师学习ES"
}
需求DSL: "path.tree": "/workspace"
GET filesystem/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"contents": "ES"
}
}
],
"filter": {
"term": {
"path.tree": "/workspace"
}
}
}
}
}
返回:
代码语言:javascript复制{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0.2876821,
"hits": [
{
"_index": "filesystem",
"_type": "file",
"_id": "5",
"_score": 0.2876821,
"_source": {
"name": "README.txt",
"path": "/workspace",
"contents": "小工匠跟石杉老师学习ES"
}
},
{
"_index": "filesystem",
"_type": "file",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": "README.txt",
"path": "/workspace/projects/helloworld",
"contents": "小工匠跟石杉老师学习ES"
}
},
{
"_index": "filesystem",
"_type": "file",
"_id": "3",
"_score": 0.2876821,
"_source": {
"name": "README.txt",
"path": "/workspace/xxxxx",
"contents": "小工匠跟石杉老师学习ES"
}
},
{
"_index": "filesystem",
"_type": "file",
"_id": "2",
"_score": 0.18232156,
"_source": {
"name": "README.txt",
"path": "/workspace/projects",
"contents": "小工匠跟石杉老师学习ES"
}
}
]
}
}
可以看到id=4的数据,不符合需求,没有被查询出来,OK。