好用的k8s event信息监控工具

2020-06-29 15:55:20 浏览数 (1)

前言

在生产中,除了日常的prometheus监控,我们还需要能够监控到k8s的事件,下面来仔细说说阿里开源的工具,kube-eventer怎么使用。

NPD

Node的主要影响kuberntes稳定性的因素

硬件错误

  • CPU坏了
  • Memory坏了
  • 磁盘坏了

kernel问题

  • kernel deadlock (内核死锁)
  • corrupted file systems (文件系统崩溃)
  • unresponsive runtime daemons (系统运行后台进程无响应)

docker问题

  • unresponsive runtime daemons (docker后台进程无响应)
  • docker image error (docker文件系统错误)

Node问题上报机制

Kubernetes支持两种上报机制:

  • NodeCondition(节点状况): 这是指永久性的错误,它将造成pod无法在这个节点运行。这个节点状况只有在节点重启后才会被重置
  • Event(事件): 影响节点的临时性问题,但是它是对于系统诊断是有意义的

Node Problem Detector(NPD)

NPD就是利用kubernetes的上报机制,通过检测系统的日志(例如centos中journal),把错误的信息上报到kuberntes的node上。

安装

代码语言:javascript复制
git clone https://github.com/vipdocker/npd-centos.git
cd npd-centos
chmod  x init-configmap.sh
./init-configmap.sh
kubectl create -f npd.yaml

eventer

钉钉报警

代码语言:javascript复制
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-eventer
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: kube-eventer
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccount: admin
      containers:
      - name: kube-eventer
        image: registry.cn-hangzhou.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-c93a835-aliyun
        imagePullPolicy: IfNotPresent
        command:
        - /kube-eventer
        - --source=kubernetes:https://kubernetes.default
        - --sink=dingtalk:[your_webhook_url]&label=[your_cluster_id]&level=[可选参数:Normal或者Warning,默认值为:Warning]

sls报警

代码语言:javascript复制
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-eventer
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: kube-eventer
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccount: admin
      containers:
      - name: kube-eventer
        image: registry.cn-hangzhou.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-c93a835-aliyun
        imagePullPolicy: IfNotPresent
        command:
        - /kube-eventer
        - --source=kubernetes:https://kubernetes.default
        - --sink=sls:https://sls.aliyuncs.com?logStore=[your_logstore]&project=[your_project]

数据存在es

代码语言:javascript复制
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: kube-eventer
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: kube-eventer
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      serviceAccount: admin
      containers:
      - name: kube-eventer
        image: registry.cn-hangzhou.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-c93a835-aliyun
        imagePullPolicy: IfNotPresent
        command:
        - /kube-eventer
        - --source=kubernetes:https://kubernetes.default
        - --sink=elasticsearch:http://10.16.16.13:9200?sniff=false&ver=6

webhook

代码语言:javascript复制
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: kube-eventer
  name: kube-eventer
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-eventer
  template:
    metadata:
      labels:
        app: kube-eventer
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccount: admin
      containers:
        - image: registry.aliyuncs.com/acs/kube-eventer-amd64:v1.1.0-252b712-aliyun
          name: kube-eventer
          command:
            - "/kube-eventer"
            - "--source=kubernetes:https://kubernetes.default"
            - --sink=elasticsearch:http://10.16.xx.xx:9200?sniff=false&ver=6&index=vnet_prod
            - --sink=webhook:http://xxxxxxx/api/v1/mk8s/cluster/events/alert/?&level=Warning&label=xxxxxxxx&header=Content-Type=application/json&custom_body_configmap=custom-webhook-body&custom_body_configmap_namespace=kube-system&method=POST
          env:
          - name: TZ
            value: "Asia/Shanghai"
          volumeMounts:
            - name: localtime
              mountPath: /etc/localtime
              readOnly: true
            - name: zoneinfo
              mountPath: /usr/share/zoneinfo
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 100Mi
            limits:
              cpu: 500m
              memory: 250Mi
      volumes:
        - name: localtime
          hostPath:
            path: /etc/localtime
        - name: zoneinfo
          hostPath:
            path: /usr/share/zoneinfo

效果

钉钉

es

0 人点赞