挖掘Kubernetes 弹性伸缩:扩展监控指标实现丰富弹性

2023-09-11 11:04:07 浏览数 (2)

简介与总结

上一篇关于HPA的文章,我们了解到HPA的实现原理,通过对服务CPU的metrics的监控实现了Deployment的弹性伸缩,但是对于我们来说,HPA核心指标较为简单,不适合个性化业务弹性的需求。我们这边文章就来研究一下扩展自定义指标,丰富业务弹性能力。在开始之前,我们需要了解两个组件。分别是Metrics server和Prometheus adapter。

Metrics server是什么?

Metrics server是一个开源指标 API (metrics.k8s.io)实现,由 Kubernetes SIG 创建和维护。metrics-server 的主要目的是帮助 Kubernetes Horizontal Pod Autoscaler根据外部因素(例如大量 HTTP 流量)自动扩展或缩减应用程序工作负载。Metrics server提供了K8S集群的Core metrics(核心指标):从 Kubelet、cAdvisor 等获取度量数据,再由metrics-server提供给 kube-scheduler、HPA、 控制器等使用。上一篇文章中的CPU,内存指标就是通过Metrics server获取到的。下图显示了 HPA(Horizontal Pod Autoscaling)如何与指标服务器结合使用:

概述: 将Metrics Server部署到您的K8S集群。Metrics服务器从每个工作节点抓取kubelet指标,收集CPU和内存使用数据 对于每个应用程序工作负载。Metrics服务器通过Kubernetes API服务器公开CPU和内存使用指标。横向Pod自动缩放器通过Kubernetes api服务器获取CPU和内存使用指标。然后。根据指标观察和目标阈值,决定何时扩大或缩小应用程序部署pod。

Prometheus adapter是什么?

Prometheus adapter与Metrics server类似,主要是通过扩展一个新的外部指标 API,HPA 控制器将使用该 API(external.metrics.k8s.io) 来获取这些指标。基于现有的自定义指标 API。除非明确指定,否则自定义指标 API 设计中与语义、实现和设计决策相关的所有部分也适用于外部指标 API。通常期望自定义指标适配器将提供自定义指标 API 和外部指标 API,但这不是一定的,这两个 API 可以单独实现和使用。

了解了自定义指标接入的原理后,假设你们的业务需要根据应用的QPS对服务进行弹性伸缩,让我们来试试吧~

最佳实践

Prometheus adapter是kube-prometheus项目下的一个插件,小伙伴可以在你的K8S集群部署kube-prometheus,我的集群已部署,详细部署步骤不在本文中赘述,可以通过查看相关yaml文件了解部署内容:

代码语言:javascript复制
#mkdir kube-prometheus 
#cd kube-prometheus 
#git clone https://github.com/coreos/kube-prometheus

查看Prometheus adapter yaml文件

代码语言:javascript复制
#cd kube-prometheus/manifests/
#mkdir prometheus-adapter
#mv prometheusAdapter*.yaml prometheus-adapter
#cd prometheus-adapter/
#ll
-rw-r--r-- 1 root root  483 Aug  6 18:23 prometheusAdapter-apiService.yaml
-rw-r--r-- 1 root root  577 Aug  6 18:23 prometheusAdapter-clusterRoleAggregatedMetricsReader.yaml
-rw-r--r-- 1 root root  495 Aug  6 18:23 prometheusAdapter-clusterRoleBindingDelegator.yaml
-rw-r--r-- 1 root root  472 Aug  6 18:23 prometheusAdapter-clusterRoleBinding.yaml
-rw-r--r-- 1 root root  379 Aug  6 18:23 prometheusAdapter-clusterRoleServerResources.yaml
-rw-r--r-- 1 root root  410 Aug  6 18:23 prometheusAdapter-clusterRole.yaml
-rw-r--r-- 1 root root 2205 Aug  6 18:23 prometheusAdapter-configMap.yaml
-rw-r--r-- 1 root root 3286 Aug  6 18:23 prometheusAdapter-deployment.yaml
-rw-r--r-- 1 root root  565 Aug  6 18:23 prometheusAdapter-networkPolicy.yaml
-rw-r--r-- 1 root root  502 Aug  6 18:23 prometheusAdapter-podDisruptionBudget.yaml
-rw-r--r-- 1 root root  516 Aug  6 18:23 prometheusAdapter-roleBindingAuthReader.yaml
-rw-r--r-- 1 root root  324 Aug  6 18:23 prometheusAdapter-serviceAccount.yaml
-rw-r--r-- 1 root root  907 Aug  6 18:23 prometheusAdapter-serviceMonitor.yaml
-rw-r--r-- 1 root root  502 Aug  6 18:23 prometheusAdapter-service.yaml

安装kube-prometheus后验证

代码语言:javascript复制
# kubectl api-versions| grep metrics
external.metrics.k8s.io/v1beta1  #接入自定义指标的api
metrics.k8s.io/v1beta1
# kubectl get pods -n monitoring -o wide | grep prometheus-adapter
prometheus-adapter-667b85b9ff-dx858   1/1     Running   0          86d   172.28.7.26   cn-shanghai.10.0.54.97   <none>           <none>
# kubectl get --raw "/apis/external.metrics.k8s.io" | jq .
{
  "kind": "APIGroup",
  "apiVersion": "v1",
  "name": "external.metrics.k8s.io",
  "versions": [
    {
      "groupVersion": "external.metrics.k8s.io/v1beta1",
      "version": "v1beta1"
    }
  ],
  "preferredVersion": {
    "groupVersion": "external.metrics.k8s.io/v1beta1",
    "version": "v1beta1"
  }
}

如果正常输出则说明安装正常 修改prometheus-adapter 的configmap,调整成需要监控的指标QPS,实际是对PromQL请求到数据做一次转换

代码语言:javascript复制
# kubectl get cm -n monitoring prometheus-adapter -o yaml
apiVersion: v1
data:
  config.yaml: |-
    rules:
    externalRules:
    - metricsQuery: <<.Series>>{<<.LabelMatchers>>}
      name:
        as: istio_request_qps
        matches: service_request_qps
      resources:
        namespaced: false
        overrides:
          namespace:
            resource: namespace
      seriesQuery: service_request_qps

  • seriesQuery:即PromQL请求数据。
  • metricsQuery:对seriesQuery中PromQL请求的数据做聚合操作。
  • resources:是PromQL里的数据Label,与resource进行匹配,这里的resouce是指集群内的api-resource,比如Pod、Namespace和Node。可通过kubectl api-resources -o wide查看。此处Key对应Prometheus数据中的LabelName,请确认Prometheus指标数据中有此LabelName。
  • name:是指根据正则匹配把Prometheus指标名转为比较可读的指标名,这里是把http_request_total转为http_request_per_second。

验证指标是否可以正常获取(namespace和metics名称需调整)

代码语言:javascript复制
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/test/istio_request_qps" | jq .
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "istio_request_qps",      #metric名称
      "metricLabels": {
        "__name__": "service_request_qps",    
        "app": "test-deploymet",              #metric标签
        "instance": "172.28.4.19:80",
        "job": "istio-exporter",
        "namespace": "test",
        "type": "istio"
      },
      "timestamp": "2023-08-06T10:44:39Z",
      "value": "30m"
    },

如果正常获取,就看到胜利的曙光了。接下来配置HPA:

代码语言:javascript复制
# kubectl get hpa -n fp -o yaml
apiVersion: v1
items:
- apiVersion: autoscaling/v2
  kind: HorizontalPodAutoscaler
  metadata:
    name: test-hpa
    namespace: test
  spec:
    maxReplicas: 3    #最大副本数
    metrics:
    - external:
        metric:
          name: istio_request_qps      #与上面的metric名称匹配
          selector:
            matchLabels:
              app: test-deployment     #与上面的metric标签匹配
        target:
          averageValue: "30"    #metrics的阈值
          type: AverageValue    #metrics的阈值的类型
      type: External
    minReplicas: 1     #最小副本数
    scaleTargetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: test-deployment    #扩展的Deployment的名称

使用压测工具进行测试,观察HPA的状态变化

代码语言:javascript复制
# ab -c 100 -n 100000  172.28.4.19:80
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking skywall-uat.snowballtech.com (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests

## kubectl get hpa -n fp -w
NAME       REFERENCE                    TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
test-hpa   Deployment/test-deployment   130m/30 (avg)   1         3         1          3m58s
test-hpa   Deployment/test-deployment   120m/30 (avg)   1         3         1          4m
test-hpa   Deployment/test-deployment   100m/30 (avg)   1         3         1          4m15s
test-hpa   Deployment/test-deployment   81980m/30 (avg)   1         3         1          4m30s
test-hpa   Deployment/test-deployment   78884m/30 (avg)   1         3         3          4m45s     #扩容到最大副本数
test-hpa   Deployment/test-deployment   133027m/30 (avg)   1         3         3          5m1s
test-hpa   Deployment/test-deployment   190407m/30 (avg)   1         3         3          5m16s
test-hpa   Deployment/test-deployment   308950m/30 (avg)   1         3         3          5m31s
test-hpa   Deployment/test-deployment   165094m/30 (avg)   1         3         3          5m46s
test-hpa   Deployment/test-deployment   113644m/30 (avg)   1         3         3          6m1s
test-hpa   Deployment/test-deployment   60874m/30 (avg)    1         3         3          6m16s
test-hpa   Deployment/test-deployment   9477m/30 (avg)     1         3         3          6m31s
test-hpa   Deployment/test-deployment   2094m/30 (avg)     1         3         3          6m46s
test-hpa   Deployment/test-deployment   144m/30 (avg)      1         3         3          7m1s
test-hpa   Deployment/test-deployment   50m/30 (avg)       1         3         3          7m16s
test-hpa   Deployment/test-deployment   57m/30 (avg)       1         3         3          7m31s
test-hpa   Deployment/test-deployment   40m/30 (avg)       1         3         3          7m46s
test-hpa   Deployment/test-deployment   27m/30 (avg)       1         3         3          8m1s
test-hpa   Deployment/test-deployment   24m/30 (avg)       1         3         3          8m16s
test-hpa   Deployment/test-deployment   27m/30 (avg)       1         3         3          8m31s
test-hpa   Deployment/test-deployment   34m/30 (avg)       1         3         3          8m46s
test-hpa   Deployment/test-deployment   17m/30 (avg)       1         3         3          9m1s
test-hpa   Deployment/test-deployment   7m/30 (avg)        1         3         3          9m16s
test-hpa   Deployment/test-deployment   10m/30 (avg)       1         3         1          9m31s    #减少至最小副本数
test-hpa   Deployment/test-deployment   7m/30 (avg)        1         3         1          9m46s
test-hpa   Deployment/test-deployment   10m/30 (avg)       1         3         1          10m
test-hpa   Deployment/test-deployment   7m/30 (avg)        1         3         1          10m
test-hpa   Deployment/test-deployment   7m/30 (avg)        1         3         1          10m

至此,测试就结束了,实现了自定义的指标接入测试并实现了伸缩

结论

在本文中,我们通过自定义指标提高了 HPA 与业务的深切感知,通过配置 Prometheus Adapter,我们演示了如何创建自定义指标,并配置 HPA 以使用这些指标来实现更精确的自动缩放。有没有给你一些启发呢?希望你能将这些技术应用到 Kubernetes 中,并体验基于自定义指标的高效、弹性自动缩放的优势!

0 人点赞