前言
在生产中,除了日常的prometheus监控,我们还需要能够监控到k8s的事件,下面来仔细说说阿里开源的工具,kube-eventer怎么使用。
NPD
Node的主要影响kuberntes稳定性的因素
硬件错误
- CPU坏了
- Memory坏了
- 磁盘坏了
kernel问题
- kernel deadlock (内核死锁)
- corrupted file systems (文件系统崩溃)
- unresponsive runtime daemons (系统运行后台进程无响应)
docker问题
- unresponsive runtime daemons (docker后台进程无响应)
- docker image error (docker文件系统错误)
Node问题上报机制
Kubernetes支持两种上报机制:
- NodeCondition(节点状况): 这是指永久性的错误,它将造成pod无法在这个节点运行。这个节点状况只有在节点重启后才会被重置
- Event(事件): 影响节点的临时性问题,但是它是对于系统诊断是有意义的
Node Problem Detector(NPD)
NPD就是利用kubernetes的上报机制,通过检测系统的日志(例如centos中journal),把错误的信息上报到kuberntes的node上。
安装
代码语言:javascript复制git clone https://github.com/vipdocker/npd-centos.git
cd npd-centos
chmod x init-configmap.sh
./init-configmap.sh
kubectl create -f npd.yaml
eventer
钉钉报警
代码语言:javascript复制apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-eventer
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
task: monitoring
k8s-app: kube-eventer
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccount: admin
containers:
- name: kube-eventer
image: registry.cn-hangzhou.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-c93a835-aliyun
imagePullPolicy: IfNotPresent
command:
- /kube-eventer
- --source=kubernetes:https://kubernetes.default
- --sink=dingtalk:[your_webhook_url]&label=[your_cluster_id]&level=[可选参数:Normal或者Warning,默认值为:Warning]
sls报警
代码语言:javascript复制apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-eventer
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
task: monitoring
k8s-app: kube-eventer
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccount: admin
containers:
- name: kube-eventer
image: registry.cn-hangzhou.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-c93a835-aliyun
imagePullPolicy: IfNotPresent
command:
- /kube-eventer
- --source=kubernetes:https://kubernetes.default
- --sink=sls:https://sls.aliyuncs.com?logStore=[your_logstore]&project=[your_project]
数据存在es
代码语言:javascript复制apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-eventer
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
task: monitoring
k8s-app: kube-eventer
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccount: admin
containers:
- name: kube-eventer
image: registry.cn-hangzhou.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-c93a835-aliyun
imagePullPolicy: IfNotPresent
command:
- /kube-eventer
- --source=kubernetes:https://kubernetes.default
- --sink=elasticsearch:http://10.16.16.13:9200?sniff=false&ver=6
webhook
代码语言:javascript复制apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: kube-eventer
name: kube-eventer
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-eventer
template:
metadata:
labels:
app: kube-eventer
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
dnsPolicy: ClusterFirstWithHostNet
serviceAccount: admin
containers:
- image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-252b712-aliyun
name: kube-eventer
command:
- "/kube-eventer"
- "--source=kubernetes:https://kubernetes.default"
- --sink=elasticsearch:http://10.16.xx.xx:9200?sniff=false&ver=6&index=vnet_prod
- --sink=webhook:http://xxxxxxx/api/v1/mk8s/cluster/events/alert/?&level=Warning&label=xxxxxxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body&custom_body_configmap_namespace=kube-system&method=POST
env:
- name: TZ
value: "Asia/Shanghai"
volumeMounts:
- name: localtime
mountPath: /etc/localtime
readOnly: true
- name: zoneinfo
mountPath: /usr/share/zoneinfo
readOnly: true
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 500m
memory: 250Mi
volumes:
- name: localtime
hostPath:
path: /etc/localtime
- name: zoneinfo
hostPath:
path: /usr/share/zoneinfo