上一篇关于HPA的文章,我们了解到HPA的实现原理,通过对服务CPU的metrics的监控实现了Deployment的弹性伸缩,但是对于我们来说,HPA核心指标较为简单,不适合个性化业务弹性的需求。我们这边文章就来研究一下扩展自定义指标,丰富业务弹性能力。在开始之前,我们需要了解两个组件。分别是Metrics server和Prometheus adapter。
Metrics server是什么?
Metrics server是一个开源指标 API (metrics.k8s.io)实现,由 Kubernetes SIG 创建和维护。metrics-server 的主要目的是帮助 Kubernetes Horizontal Pod Autoscaler根据外部因素(例如大量 HTTP 流量)自动扩展或缩减应用程序工作负载。Metrics server提供了K8S集群的Core metrics(核心指标):从 Kubelet、cAdvisor 等获取度量数据,再由metrics-server提供给 kube-scheduler、HPA、 控制器等使用。上一篇文章中的CPU,内存指标就是通过Metrics server获取到的。下图显示了 HPA(Horizontal Pod Autoscaling)如何与指标服务器结合使用:
概述: 将Metrics Server部署到您的K8S集群。Metrics服务器从每个工作节点抓取kubelet指标,收集CPU和内存使用数据 对于每个应用程序工作负载。Metrics服务器通过Kubernetes API服务器公开CPU和内存使用指标。横向Pod自动缩放器通过Kubernetes api服务器获取CPU和内存使用指标。然后。根据指标观察和目标阈值,决定何时扩大或缩小应用程序部署pod。
Prometheus adapter是什么?
Prometheus adapter与Metrics server类似,主要是通过扩展一个新的外部指标 API,HPA 控制器将使用该 API(external.metrics.k8s.io) 来获取这些指标。基于现有的自定义指标 API。除非明确指定,否则自定义指标 API 设计中与语义、实现和设计决策相关的所有部分也适用于外部指标 API。通常期望自定义指标适配器将提供自定义指标 API 和外部指标 API,但这不是一定的,这两个 API 可以单独实现和使用。
了解了自定义指标接入的原理后,假设你们的业务需要根据应用的QPS对服务进行弹性伸缩,让我们来试试吧~
最佳实践
Prometheus adapter是kube-prometheus项目下的一个插件,小伙伴可以在你的K8S集群部署kube-prometheus,我的集群已部署,详细部署步骤不在本文中赘述,可以通过查看相关yaml文件了解部署内容:
代码语言:javascript复制#mkdir kube-prometheus
#cd kube-prometheus
#git clone https://github.com/coreos/kube-prometheus
查看Prometheus adapter yaml文件
代码语言:javascript复制#cd kube-prometheus/manifests/
#mkdir prometheus-adapter
#mv prometheusAdapter*.yaml prometheus-adapter
#cd prometheus-adapter/
#ll
-rw-r--r-- 1 root root 483 Aug 6 18:23 prometheusAdapter-apiService.yaml
-rw-r--r-- 1 root root 577 Aug 6 18:23 prometheusAdapter-clusterRoleAggregatedMetricsReader.yaml
-rw-r--r-- 1 root root 495 Aug 6 18:23 prometheusAdapter-clusterRoleBindingDelegator.yaml
-rw-r--r-- 1 root root 472 Aug 6 18:23 prometheusAdapter-clusterRoleBinding.yaml
-rw-r--r-- 1 root root 379 Aug 6 18:23 prometheusAdapter-clusterRoleServerResources.yaml
-rw-r--r-- 1 root root 410 Aug 6 18:23 prometheusAdapter-clusterRole.yaml
-rw-r--r-- 1 root root 2205 Aug 6 18:23 prometheusAdapter-configMap.yaml
-rw-r--r-- 1 root root 3286 Aug 6 18:23 prometheusAdapter-deployment.yaml
-rw-r--r-- 1 root root 565 Aug 6 18:23 prometheusAdapter-networkPolicy.yaml
-rw-r--r-- 1 root root 502 Aug 6 18:23 prometheusAdapter-podDisruptionBudget.yaml
-rw-r--r-- 1 root root 516 Aug 6 18:23 prometheusAdapter-roleBindingAuthReader.yaml
-rw-r--r-- 1 root root 324 Aug 6 18:23 prometheusAdapter-serviceAccount.yaml
-rw-r--r-- 1 root root 907 Aug 6 18:23 prometheusAdapter-serviceMonitor.yaml
-rw-r--r-- 1 root root 502 Aug 6 18:23 prometheusAdapter-service.yaml
安装kube-prometheus后验证
代码语言:javascript复制# kubectl api-versions| grep metrics
external.metrics.k8s.io/v1beta1 #接入自定义指标的api
metrics.k8s.io/v1beta1
# kubectl get pods -n monitoring -o wide | grep prometheus-adapter
prometheus-adapter-667b85b9ff-dx858 1/1 Running 0 86d 172.28.7.26 cn-shanghai.10.0.54.97 <none> <none>
# kubectl get --raw "/apis/external.metrics.k8s.io" | jq .
{
"kind": "APIGroup",
"apiVersion": "v1",
"name": "external.metrics.k8s.io",
"versions": [
{
"groupVersion": "external.metrics.k8s.io/v1beta1",
"version": "v1beta1"
}
],
"preferredVersion": {
"groupVersion": "external.metrics.k8s.io/v1beta1",
"version": "v1beta1"
}
}
如果正常输出则说明安装正常 修改prometheus-adapter 的configmap,调整成需要监控的指标QPS,实际是对PromQL请求到数据做一次转换
代码语言:javascript复制# kubectl get cm -n monitoring prometheus-adapter -o yaml
apiVersion: v1
data:
config.yaml: |-
rules:
externalRules:
- metricsQuery: <<.Series>>{<<.LabelMatchers>>}
name:
as: istio_request_qps
matches: service_request_qps
resources:
namespaced: false
overrides:
namespace:
resource: namespace
seriesQuery: service_request_qps
- seriesQuery:即PromQL请求数据。
- metricsQuery:对seriesQuery中PromQL请求的数据做聚合操作。
- resources:是PromQL里的数据Label,与resource进行匹配,这里的resouce是指集群内的api-resource,比如Pod、Namespace和Node。可通过kubectl api-resources -o wide查看。此处Key对应Prometheus数据中的LabelName,请确认Prometheus指标数据中有此LabelName。
- name:是指根据正则匹配把Prometheus指标名转为比较可读的指标名,这里是把http_request_total转为http_request_per_second。
验证指标是否可以正常获取(namespace和metics名称需调整)
代码语言:javascript复制kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/test/istio_request_qps" | jq .
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metricName": "istio_request_qps", #metric名称
"metricLabels": {
"__name__": "service_request_qps",
"app": "test-deploymet", #metric标签
"instance": "172.28.4.19:80",
"job": "istio-exporter",
"namespace": "test",
"type": "istio"
},
"timestamp": "2023-08-06T10:44:39Z",
"value": "30m"
},
如果正常获取,就看到胜利的曙光了。接下来配置HPA:
代码语言:javascript复制# kubectl get hpa -n fp -o yaml
apiVersion: v1
items:
- apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: test-hpa
namespace: test
spec:
maxReplicas: 3 #最大副本数
metrics:
- external:
metric:
name: istio_request_qps #与上面的metric名称匹配
selector:
matchLabels:
app: test-deployment #与上面的metric标签匹配
target:
averageValue: "30" #metrics的阈值
type: AverageValue #metrics的阈值的类型
type: External
minReplicas: 1 #最小副本数
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-deployment #扩展的Deployment的名称
使用压测工具进行测试,观察HPA的状态变化
代码语言:javascript复制# ab -c 100 -n 100000 172.28.4.19:80
This is ApacheBench, Version 2.3 <$Revision: 1430300 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking skywall-uat.snowballtech.com (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
## kubectl get hpa -n fp -w
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
test-hpa Deployment/test-deployment 130m/30 (avg) 1 3 1 3m58s
test-hpa Deployment/test-deployment 120m/30 (avg) 1 3 1 4m
test-hpa Deployment/test-deployment 100m/30 (avg) 1 3 1 4m15s
test-hpa Deployment/test-deployment 81980m/30 (avg) 1 3 1 4m30s
test-hpa Deployment/test-deployment 78884m/30 (avg) 1 3 3 4m45s #扩容到最大副本数
test-hpa Deployment/test-deployment 133027m/30 (avg) 1 3 3 5m1s
test-hpa Deployment/test-deployment 190407m/30 (avg) 1 3 3 5m16s
test-hpa Deployment/test-deployment 308950m/30 (avg) 1 3 3 5m31s
test-hpa Deployment/test-deployment 165094m/30 (avg) 1 3 3 5m46s
test-hpa Deployment/test-deployment 113644m/30 (avg) 1 3 3 6m1s
test-hpa Deployment/test-deployment 60874m/30 (avg) 1 3 3 6m16s
test-hpa Deployment/test-deployment 9477m/30 (avg) 1 3 3 6m31s
test-hpa Deployment/test-deployment 2094m/30 (avg) 1 3 3 6m46s
test-hpa Deployment/test-deployment 144m/30 (avg) 1 3 3 7m1s
test-hpa Deployment/test-deployment 50m/30 (avg) 1 3 3 7m16s
test-hpa Deployment/test-deployment 57m/30 (avg) 1 3 3 7m31s
test-hpa Deployment/test-deployment 40m/30 (avg) 1 3 3 7m46s
test-hpa Deployment/test-deployment 27m/30 (avg) 1 3 3 8m1s
test-hpa Deployment/test-deployment 24m/30 (avg) 1 3 3 8m16s
test-hpa Deployment/test-deployment 27m/30 (avg) 1 3 3 8m31s
test-hpa Deployment/test-deployment 34m/30 (avg) 1 3 3 8m46s
test-hpa Deployment/test-deployment 17m/30 (avg) 1 3 3 9m1s
test-hpa Deployment/test-deployment 7m/30 (avg) 1 3 3 9m16s
test-hpa Deployment/test-deployment 10m/30 (avg) 1 3 1 9m31s #减少至最小副本数
test-hpa Deployment/test-deployment 7m/30 (avg) 1 3 1 9m46s
test-hpa Deployment/test-deployment 10m/30 (avg) 1 3 1 10m
test-hpa Deployment/test-deployment 7m/30 (avg) 1 3 1 10m
test-hpa Deployment/test-deployment 7m/30 (avg) 1 3 1 10m
至此,测试就结束了,实现了自定义的指标接入测试并实现了伸缩
结论
在本文中,我们通过自定义指标提高了 HPA 与业务的深切感知,通过配置 Prometheus Adapter,我们演示了如何创建自定义指标,并配置 HPA 以使用这些指标来实现更精确的自动缩放。有没有给你一些启发呢?希望你能将这些技术应用到 Kubernetes 中,并体验基于自定义指标的高效、弹性自动缩放的优势!