运行环境
操作系统:Centos7 Server版本:prometheus-2.22.0 Alertmanage版本:alertmanager-0.21.0.linux-amd64
所需端口
Server端:9090 Alertmanage端:9093
软件部署
Github:Alertmanage-0.21.0
部署Alertmanage
代码语言:javascript复制[root@prometheus software]# cp -rf alertmanager-0.21.0.linux-amd64.tar.gz /usr/local/alertmanage
[root@prometheus software]# cd /usr/local/alertmanager/
[root@prometheus alertmanager]# ls
alertmanager alertmanager.yml alertmanager.yml.bak amtool data email.tmpl LICENSE NOTICE
[root@prometheus alertmanager]# vim alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: 'im@lian.st'
smtp_smarthost: 'smtp.地址:465'
smtp_auth_username: '发件人用户名'
smtp_auth_password: '发件人密码'
smtp_require_tls: false
smtp_hello: 'lian.st'
#templates: # 自定义邮件模板
#- '/usr/local/alertmanager/email.tmpl'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '9763307@qq.com' # 收件人地址
#html: '{{ template "email.to.html" . }}'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
进程守护
代码语言:javascript复制[root@prometheus alertmanager]# vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager.server
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
[root@prometheus alertmanager]# systemctl daemon-reload
[root@prometheus alertmanager]# systemctl enable alertmanager.service
检查进程/端口
代码语言:javascript复制[root@prometheus alertmanager]# ps aux | grep alertmanage
root 1517645 0.0 1.7 723836 32048 ? Ssl Nov02 0:33 /usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanagr/alertmanager.yml
root 1698866 0.0 0.0 12112 1088 pts/0 R 10:01 0:00 grep --color=auto alertmanage
[root@prometheus alertmanager]# ss -anptu | grep 9093
tcp LISTEN 0 128 *:9093 *:* users:(("alertmanager",pid=1517645,fd=9))
测试访问
定义报警规则
代码语言:javascript复制[root@prometheus alertmanager]# cd /usr/local/prometheus/rules/
groups:
- name: node-up
rules:
- alert: node-up
expr: up{job="node"} == 0
for: 5s
labels:
severity: 1
team: node
annotations:
summary: "{{ $labels.instance }} 已停止运行!"
description: "{{ $labels.instance }} 检测到异常停止!请重点关注!!!"
[root@prometheus rules]# systemctl restart prometheus_server
[root@prometheus rules]# systemctl restart alertmanager
测试告警
根据上述报警规则,我们将一台机器的node_exporter停止,来触发告警。
代码语言:javascript复制[root@pwd ~]# systemctl stop node_exporter
[root@pwd ~]# systemctl status node_exporter
● node_exporter.service - node_exporter
Loaded: loaded (/usr/lib/systemd/system/node_exporter.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=timex
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=udp_queues
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=uname
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=vmstat
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=xfs
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:112 collector=zfs
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=node_exporter.go:191 msg="Listening...s=:9100
Nov 02 03:28:21 pwd.lian.st node_exporter[1295]: level=info ts=2020-11-02T08:28:21.980Z caller=tls_config.go:170 msg="TLS is disab...2=false
Nov 02 21:07:42 pwd.lian.st systemd[1]: Stopping node_exporter...
Nov 02 21:07:42 pwd.lian.st systemd[1]: Stopped node_exporter.
Hint: Some lines were ellipsized, use -l to show in full.
告警邮件
恢复邮件
TODO
告警介质:钉钉,微信,短信,TG; 告警规则:常用规则(linux server、nginx、apahce、mysql、redis、jvm)