概要:主要通过webhook向企业微信告警,运用多sink与多configmap实现多微信群告警,并@对应人。
离线事件告警
kube-eventer是由阿里开源的k8s离线事件收集器,开源地址
https://github.com/AliyunContainerService/kube-eventer/blob/master/docs/en/webhook-sink.md
在Kubernetes中,事件分为两种,一种是Warning事件,表示产生这个事件的状态转换是在非预期的状态之间产生的;另外一种是Normal事件,表示期望到达的状态,和目前达到的状态是一致的。
我们以NPD的event来讲解。事件影响节点的临时性问题,但是它是对于系统诊断是有意义的。NPD就是利用kubernetes的上报机制,通过检测系统的日志(例如centos中journal),把错误的信息上报到kuberntes的node上。这些日志(例如内核日志)中噪音信息太多,NPD会提取其中有价值的信息,可以将这些信息生成离线事件。这样我就可以得到node上的时间,及时进行处理。
一个标准的Kubernetes事件有如下几个重要的属性,通过这些属性可以更好地诊断和告警问题。
Namespace:产生事件的对象所在的命名空间。
Kind:绑定事件的对象的类型,例如:Node、Pod、Namespace、Componenet等等。
Timestamp:事件产生的时间等等。
Reason:产生这个事件的原因。Message: 事件的具体描述。
image.png
目前的sinks支持大致如下:
Sink Name | Description |
---|---|
dingtalk | sink to dingtalk bot |
sls | sink to alibaba cloud sls service |
elasticsearch | sink to elasticsearch |
honeycomb | sink to honeycomb |
influxdb | sink to influxdb |
kafka | sink to kafka |
mysql | sink to mysql database |
sink to wechat |
今天主要带来webhook的开挂技巧。首先看支持的参数:
level
- Level of event (optional. default: Warning. Options: Warning and Normal)namespaces
- Namespaces to filter (optional. default: all namespaces,use commas to separate multi namespaces, namespace filter doesn't support regexp)kinds
- Kinds to filter (optional. default: all kinds,use commas to separate multi kinds. Options: Node,Pod and so on.)reason
- Reason to filter (optional. default: empty, Regexp pattern support). You can use multi reason fields in query.method
- Method to send request (optional. default: GET)header
- Header in request (optional. default: empty). You can use multi header field in query.custom_body_configmap
- The configmap name of request body template. You can use Template to customize request body. (optional.)custom_body_configmap_namespace
- The configmap namespace of request body template.
如果每个项目namespace与负责人是一一对应的,就可以根据configmap与sink关联起来。变更上线部署是最容易出现事件的时候,通过事件是可以快速的发现上线的镜像tag错误,镜像配置错误等问题。
首先configmap,通过custom_body_configmap的值来选择不同的配置文件。可以简单修饰一下,使其变得更加清晰。
添加加Cluster:name可以知道是哪个集群的event。
添加加"mentioned_list":["wangqin","@all"]可以@对应的负责人。
代码语言:javascript复制---
apiVersion: v1
data:
content: >-
{"msgtype": "text","text": {"content": "Cluster:namenEventType:{{ .Type }}nEventNamespace:{{ .InvolvedObject.Namespace }}nEventKind:{{ .InvolvedObject.Kind }}nEventObject:{{ .InvolvedObject.Name }}nEventReason:{{ .Reason }}nEventTime:{{ .LastTimestamp }}nEventMessage:{{ .Message }}","mentioned_list":["wangqing","@all"]}}
kind: ConfigMap
metadata:
name: custom-webhook-body
namespace: nameapce
命令部分的技巧
sink是一个数组,可以加很多条。
主要说明用webhook向企业微信的的通知。注意reason是可以支持正则表达式的。通过configmap就一起完成了k8s机器的事件告警。
代码语言:javascript复制command:
- "/kube-eventer"
- "--source=kubernetes:https://kubernetes.default"
## .e.g,dingtalk sink demo
- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=[^Unhealthy]&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body0&custom_body_configmap_namespace=xxxx&method=POST
- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=BackOff&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body1&custom_body_configmap_namespace=xxxx&method=POST
- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=Failed&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body2&custom_body_configmap_namespace=xxxxx&method=POST
案列:
创建一个企业微信群的机器人。比如:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx。
代码语言:javascript复制apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: kube-eventer
name: kube-eventer
namespace: namespace
spec:
replicas: 1
selector:
matchLabels:
app: kube-eventer
template:
metadata:
labels:
app: kube-eventer
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
dnsPolicy: ClusterFirstWithHostNet
serviceAccount: kube-eventer
containers:
- image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.2.0-484d9cd-aliyun
name: kube-eventer
command:
- "/kube-eventer"
- "--source=kubernetes:https://kubernetes.default"
## .e.g,dingtalk sink demo
- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=[^Unhealthy]&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body0&custom_body_configmap_namespace=xxxx&method=POST
#- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=BackOff&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body1&custom_body_configmap_namespace=xxxx&method=POST
#- --sink=webhook:https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxxxxxx&level=Warning&reason=Failed&namespaces=xxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body2&custom_body_configmap_namespace=xxxxx&method=POST
env:
# If TZ is assigned, set the TZ value as the time zone
- name: TZ
value: "Asia/Shanghai"
volumeMounts:
- name: localtime
mountPath: /etc/localtime
readOnly: true
- name: zoneinfo
mountPath: /usr/share/zoneinfo
readOnly: true
resources:
requests:
cpu: 200m
memory: 100Mi
limits:
cpu: 500m
memory: 250Mi
volumes:
- name: localtime
hostPath:
path: /etc/localtime
- name: zoneinfo
hostPath:
path: /usr/share/zoneinfo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-eventer
rules:
- apiGroups:
- ""
resources:
- events
- configmaps
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-eventer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-eventer
subjects:
- kind: ServiceAccount
name: kube-eventer
namespace: namespace
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-eventer
namespace: namespace
---
apiVersion: v1
data:
content: >-
{"msgtype": "text","text": {"content": "Cluster:namenEventType:{{ .Type }}nEventNamespace:{{ .InvolvedObject.Namespace }}nEventKind:{{ .InvolvedObject.Kind }}nEventObject:{{ .InvolvedObject.Name }}nEventReason:{{ .Reason }}nEventTime:{{ .LastTimestamp }}nEventMessage:{{ .Message }}","mentioned_list":["wangqing","@all"]}}
kind: ConfigMap
metadata:
name: custom-webhook-body
namespace: nameapce
这样就可以完成向谁告警,谁进行处理的简单分配。有了事件告警,可以及时发现服务问题与集群问题并进行修复。