Kube-Prometheus集群安装教程

2024-07-26 11:33:52 浏览数 (3)

1 .版本要求

k8s集群版本

kube-prometheus版本

部署方式

v1.18

<=v0.6.0

单节点中心化部署

2. 最小化安装说明

服务

是否保留部署

副本数

部署形式

alertmanager-main

1

statefulset

kube-state-metrics

1

deployment

node-exporter

1

daemonset

prometheus-adapter

1

deployment

prometheus-operator

1

deployment

grafana

1

deployment

prometheus-k8s

1

statefulset

blackbox-exporter

deployment

3.告警模块配置(alertmanager-secret.yaml)

代码语言:txt复制
apiVersion: v1
data: {}
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    "global":
      "resolve_timeout": "5m"
    "inhibit_rules":
    - "equal":
      - "namespace"
      - "alertname"
      "source_match":
        "severity": "critical"
      "target_match_re":
        "severity": "warning|info"
    - "equal":
      - "namespace"
      - "alertname"
      "source_match":
        "severity": "warning"
      "target_match_re":
        "severity": "info"
    "receivers":
    - "name": "simplecloud"
      "webhook_configs":
      - "url": "http://xxx:8554/notifications"
        "http_config": 
          "bearer_token": "xxx"
    - "name": "Watchdog"
    - "name": "Critical"
    "route":
      "group_by":
      - "namespace"
      "group_interval": "5m"
      "group_wait": "30s"
      "receiver": "xxx"
      "repeat_interval": "12h"
      "routes":
      - "match":
          "alertname": "Watchdog"
        "receiver": "Watchdog"
      - "match":
          "severity": "critical"
          "repeat_interval": "1h"
        "receiver": "Critical"
      - "match":
          "severity": "warning"
          "repeat_interval": "1d"
      - "match":
          "severity": "info"
          "repeat_interval": "7d"
type: Opaque

4.告警规则配置(prometheus-rules.yaml

代码语言:txt复制
- name: Pod状态异常
    rules:
    - alert: Pod状态异常
      annotations:
        description: The pod {{ $labels.pod }} in namespace {{ $labels.namespace }}
          was unavailable.
        summary: Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is unavailable.
      expr: min_over_time(sum by (namespace, pod, phase) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[5m:1m])
        > 0
      for: 2m
      labels:
        severity: critical
  - name: Deployment可用副本状态异常
    rules:
    - alert: 工作负载可用副本数异常
      annotations:
        description: The pods of {{ $labels.deployment}} is unavalilable.
        summary: The Status of {{ $labels.deployment}} pods is abnomal
      expr: kube_deployment_spec_replicas{} != kube_deployment_status_replicas_available{}
      for: 2m
      labels:
        severity: critical
  - name: Pod启动失败
    rules:
    - alert: 5分钟内Pod重启累计3次以上
      annotations:
        description: The Pod {{ $labels.namespace }}/{{ $labels.pod }} has failed
          to start.
        summary: Pod {{ $labels.namespace }}/{{ $labels.pod }} failed to start
      expr: sum_over_time(increase(kube_pod_container_status_restarts_total{}[1m])[5m:1m])
        >3
      for: 5m
      labels:
        severity: critical

更多个性化告警规则配置可参考阿里云告警配置,这里插入友方超链接会被屏蔽,有需要的小伙伴可以在文章底下私信我。

5.k8s常用指标自定义标签配置

原脚本所有xxx-serviceMonitor.yaml添加以下配置片段:

代码语言:txt复制
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: prometheus
  name: prometheus
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: web
    metricRelabelings:
    - sourceLabels: []
      targetLabel: env
      replacement: '测试'
    - sourceLabels: []
      targetLabel: cluster
      replacement: '华南1b测试'
    - replacement: k8s-test
      sourceLabels: []
      targetLabel: type
    - replacement: huanan1b-sc-test
      sourceLabels: []
      targetLabel: from
    - replacement: prometheus-k8s-0
      sourceLabels: []
      targetLabel: prometheus_replica
  selector:
    matchLabels:
      prometheus: k8s

6.cadvisor指标自定义标签配置

代码语言:txt复制
remote_write:
  - url: "http://remote-write-service:9090/api/v1/write"
    write_relabel_configs:
      - source_labels: ["__name__"]
        regex: "my_metric|another_metric|yet_another_metric"
        action: keep

0 人点赞