ES集群任务查杀场景指南

2023-11-22 16:32:56 浏览数 (1)

场景:集群压力大,需要对慢查询任务进行查杀,避免集群被拖垮。

慢查询任务模拟

1.对集群中多个索引进行模糊查询(查询完成需要几十秒 )

代码语言:javascript复制
time curl -s -uelastic:password -XPOST -H "Content-Type:application/json" 127.0.0.1:9200/ss-*/_search?pretty -d '
{
  "query":{
      "wildcard":{
        "address":"*w*"
      }
  },
  "size": 65535,
  "aggs": {
    "name_counts": {
      "terms": {
        "field": "name"
      }
    },
    "age_counts": {
      "terms": {
        "field": "age"
      }
    },
    "gender_counts": {
      "terms": {
        "field": "gender"
      }
    },
    "job_counts": {
      "terms": {
        "field": "job"
      }
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}' | grep "failed"

2. 查询集群的“查询任务”

代码语言:javascript复制
curl -uelastic:password '127.0.0.1:9200/_cat/tasks?actions=indices:data/read/*&v'

indices:data/read/*为全部读任务的action字段,若查询所有写任务则为indices:data/write/*

查询结果如图:

3. 取消指定任务

代码语言:javascript复制
POST _tasks/parent_task_id/_cancel

4. 再次查询集群任务

代码语言:javascript复制
curl -uelastic:password '127.0.0.1:9200/_cat/tasks?actions=indices:data/read/*&v'

返回里没有cancel的那条查询task,符合预期。

5. 根据task获取query信息

代码语言:javascript复制
[root@VM-29-26-tencentos ~]# curl -uelastic:password '127.0.0.1:9200/_tasks?actions=*search&detailed&pretty'
{
  "nodes" : {
// 此处省略25行
      "tasks" : {
          "rBo6PZ6LRjCI2x03Tv38JA:2550697" : {
          "node" : "rBo6PZ6LRjCI2x03Tv38JA",
          "id" : 2550697,
          "type" : "transport",
          "action" : "indices:data/read/search",
          "description" : "indices[ss-202307251649,...,ss-2023.07.25-000023], types[], search_type[QUERY_THEN_FETCH], source[{"size":65535,"query":{"wildcard":{"address":{"wildcard":"*w*","boost":1.0}}},"aggregations":{"name_counts":{"terms":{"field":"name","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"age_counts":{"terms":{"field":"age","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"gender_counts":{"terms":{"field":"gender","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"job_counts":{"terms":{"field":"job","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}}},"highlight":{"fields":{"name":{}}}}]",
          "start_time_in_millis" : 1690454404472,
          "running_time_in_nanos" : 20935488310,
          "cancellable" : true,
          "cancelled" : false,
          "headers" : { }
        }
      }
    }
  }
}

腾讯云ES与社区自建对比

腾讯云另外提供了 timeout 主动 cancel query 的能力

腾讯云上ES

timeout参数

内核对timeout参数进行了优化,任务在指定时间过后如果未完成,则会被自动取消,降低集群压力。

社区自建ES

timeout参数

查询任务在指定时间过后如果未完成,则会断开连接,任务仍在后台运行。

代码语言:javascript复制
GET /index_name/_search
{
  "timeout":"10s",
  "query":{
      "wildcard":{
        "address":"wu*"
      }
  }
}

腾讯云ES内核优化

Elasticsearch Service 内核版本发布记录-ES 内核增强-文档中心-腾讯云

我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!

0 人点赞