一、简介
Prometheus 是由前 Google 工程师从 2012 年开始在 Soundcloud 以开源软件的形式进行研发的系统监控和告警工具包,自此以后,许多公司和组织都采用了 Prometheus 作为监控告警工具。Prometheus 的开发者和用户社区非常活跃,它现在是一个独立的开源项目,可以独立于任何公司进行维护。为了证明这一点,Prometheus 于 2016 年 5 月加入 CNCF 基金会,成为继 Kubernetes 之后的第二个 CNCF 托管项目.
二、特点
Prometheus主要特点:
- a multi-dimensional data model with time series data identified by metric name and key/value pairs
- PromQL, a flexible query language to leverage this dimensionality
- no reliance on distributed storage; single server nodes are autonomous
- time series collection happens via a pull model over HTTP
- pushing time series is supported via an intermediary gateway
- targets are discovered via service discovery or static configuration
- multiple modes of graphing and dashboarding support
三、组件
- Prometheus server :收集并存储时间序列数据
- client libraries:用于检测应用程序代码
- push gateway:支持短期工作
- special-purpose exporters:支持HAProxy,StatsD,Graphite等服务
- alertmanager:处理警报
- various support tools
官方工作架构图:
四、环境背景
架构说明:目标环境为K8S环境,每个k8S环境都伴有一个Prometheus集群,由一个外部Prometheus通过federate采集prometheus数据,并将数据写入remote storage 远端TSDB数据库 -- M3DB,通过外部Grafana查询prometheus datasource,Alertmanager 采用Gossip协议部署高可用双节点,Pushgateway 负责接收端节点的exporter数据。
注:每个架构都不是十全十美的,上述架构有一个明显的瓶颈就是在外部prometheus上,当外部Prometheus出现资源不足时,会造成数据采集异常,并且M3DB的Coordinator出现IO资源不足时,容易造成数据读写堵塞。目前可以优化的是,K8S Prometheus集群采用Operator方式,利用k8s sidecar模式,将Prometheus的数据写入Thanos tsdb数据库,这将会大大减少单点故障的影响,并且Thanos支持更多的功能特性,想要了解的可以访问Thanos官网。出于Thanos现不支持aliyun oss的考虑,暂不采用如下方式:
五、K8S Prometheus部署
由于公司采用的是Rancher,故K8S的prometheus集群就不具体描述,可以通过Rancher的应用商店,或者下载官方配置文件部署。需要指出的是,prometheus的端口可以通过nodeport或者ingress的方式暴露出来,在此可以假设域名为prom-01.domain.com,prom-02.domain.com
『后续外部prometheus federate用到』。考虑安全性,可以用basic-auth等方式,对端口/域名进行访问加密『公司采用Traefik v2.0的middleware进行basic-auth加密』。
六、外部Prometheus部署
外部Prometheus 采用Docker-compose的方式部署:
系统环境:
- IP: 172.16.18.6
- 系统: centos7.4
docker images:
- prometheus server: prom/prometheus:v2.14.0
- alertmanager: prom/alertmanager:v0.19.0
- pushgateway: prom/pushgateway:v1.0.0
- grafana: grafana/grafana:6.4.4
docker-compose.yml 配置参考:
代码语言:javascript复制version: "3"
services:
prom:
image: prom/prometheus:v2.14.0
hostname: prom.domain.com
container_name: prometheus
restart: always
volumes:
- /opt/prometheus.yml:/etc/prometheus/prometheus.yml
- /opt/rules.yml:/etc/prometheus/rules.yml
- /opt/rules:/etc/prometheus/rules
- /opt/prometheus:/prometheus
environment:
- STORAGE.TSDB.RETENTION=7d #prometheus本地tsdb数据保留时间为7天
ports:
- 9090:9090
alertmanager01:
image: prom/alertmanager:v0.19.0
hostname: alert1.domain.com
container_name: alertmanager_01
restart: always
volumes:
- /opt/alertmanager.yml:/etc/alertmanager/config.yml
command:
- '--web.listen-address=:9093'
- '--cluster.listen-address=0.0.0.0:8001' #开启gossip协议
- '--config.file=/etc/alertmanager/config.yml'
ports:
- 9093:9093
- 8001:8001
alertmanager02:
image: prom/alertmanager:v0.19.0
hostname: alert2.domain.com
container_name: alertmanager_02
restart: always
depends_on:
- alertmanager01
volumes:
- /opt/alertmanager.yml:/etc/alertmanager/config.yml
command:
- '--web.listen-address=:9094'
- '--cluster.listen-address=0.0.0.0:8002'
- '--cluster.peer=172.16.18.6:8001' #slave监听
- '--config.file=/etc/alertmanager/config.yml'
ports:
- 9094:9094
- 8002:8002
pushgateway:
image: prom/pushgateway:v1.0.0
container_name: pushgateway
restart: always
ports:
- 9091:9091
grafana:
image: grafana/grafana:6.4.4
hostname: grafana.domain.com
container_name: grafana
restart: always
volumes:
- /opt/grafana-storage:/var/lib/grafana
ports:
- 3000:3000
environment:
- GF_SECURITY_ADMIN_PASSWORD=xxxxxx
- GF_SMTP_ENABLED=true
- GF_SMTP_HOST=smtp.qiye.aliyun.com:465
- GF_SMTP_USER=xxxxxxx
- GF_SMTP_PASSWORD=xxxxxx
- GF_SMTP_FROM_ADDRESS=xxxxxxxx
- GF_SERVER_ROOT_URL=http://grafana.domain.com
附各配置文件:
代码语言:javascript复制#prometheus.yml
global: # 全局设置
scrape_interval: 60s # 用于向pushgateway采集数据的频率
evaluation_interval: 30s # Evaluate rules every 15 seconds. The default is every 1 minute.表示规则计算的频率
external_labels:
cid: '1'
alerting:
alertmanagers:
- static_configs:
- targets: ['172.16.18.6:9093','172.16.18.6:9094'] #alertmanager主从节点
rule_files:
- /etc/prometheus/rules.yml
- /etc/prometheus/rules/*.rules
remote_write:
- url: "http://172.16.10.12:7201/api/v1/prom/remote/write" #M3DB 远程写
queue_config:
batch_send_deadline: 60s
capacity: 40000
max_backoff: 600ms
max_samples_per_send: 8000
max_shards: 10
min_backoff: 50ms
min_shards: 6
remote_timeout: 30s
write_relabel_configs:
- source_labels: [__name__]
regex: go_.*
action: drop
- source_labels: [__name__]
regex: http_.*
action: drop
- source_labels: [__name__]
regex: prometheus_.*
action: drop
- source_labels: [__name__]
regex: scrape_.*
action: drop
- source_labels: [__name__]
regex: go_.*
action: drop
- source_labels: [__name__]
regex: net_.*
action: drop
- source_labels: ["kubernetes_name"]
regex: prometheus-node-exporter
action: drop
- source_labels: [__name__]
regex: rpc_.*
action: keep
- source_labels: [__name__]
regex: jvm_.*
action: keep
- source_labels: [__name__]
regex: net_.*
action: drop
- source_labels: [__name__]
regex: crd.*
action: drop
- source_labels: [__name__]
regex: kube_.*
action: drop
- source_labels: [__name__]
regex: etcd_.*
action: drop
- source_labels: [__name__]
regex: coredns_.*
action: drop
- source_labels: [__name__]
regex: apiserver_.*
action: drop
- source_labels: [__name__]
regex: admission_.*
action: drop
- source_labels: [__name__]
regex: DiscoveryController_.*
action: drop
- source_labels: ["job"]
regex: kubernetes-apiservers
action: drop
- source_labels: [__name__]
regex: container_.*
action: drop
remote_read:
- url: "http://172.16.7.172:7201/api/v1/prom/remote/read" #M3DB 远程读
read_recent: true
scrape_configs:
#基于consul服务发现
# - job_name: 'consul-prometheus'
# metrics_path: /metrics
# scheme: http
# consul_sd_configs:
# - server: '172.16.18.6:8500'
# scheme: http
# services: ['ops']
# refresh_interval: 1m
#基于文件的服务发现
- job_name: 'file_ds'
file_sd_configs:
- refresh_interval: 30s
files:
- /prometheus/*.json
# - job_name: 'm3db'
# static_configs:
# - targets: ['172.16.10.12:7203']
- job_name: 'federate'
scrape_interval: 15s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job=~"kubernetes-.*"}'
static_configs:
- targets:
- 'prom-01.domain.com'
- 'prom-02.domain.com' #k8s prometheus域名或者ip:port
basic_auth:
username: xxxx
password: xxxxxxx
relabel_configs:
- source_labels: [__name__]
regex: http_.*
action: drop
- source_labels: [__name__]
regex: prometheus_.*
action: drop
- source_labels: [__name__]
regex: scrape_.*
action: drop
- source_labels: [__name__]
regex: go_.*
action: drop
代码语言:javascript复制#alertmanager.yml
# 全局配置项
global:
resolve_timeout: 5m #处理超时时间,默认为5min
smtp_smarthost: 'smtp.qq.com:587'
smtp_from: 'xxxxxxx@qq.com'
smtp_auth_username: 'xxxxxxxxx@qq.com'
smtp_auth_password: 'xxxxxxxxxx'
smtp_require_tls: true
# 定义路由树信息
route:
group_by: ['alertname'] #定义分组规则标签
group_wait: 30s #定义一定时间内等待接收新的告警
group_interval: 1m #定义相同Group之间发送告警通知的时间间隔
repeat_interval: 1h #发送通知后的静默等待时间
receiver: 'bz' # 发送警报的接者的名称,以下receivers name的名称
routes:
- receiver: bz
match:
severity: red|yellow #与rules.yml里labels对应
# 定义警报接收者信息
receivers:
- name: 'bz'
email_configs:
- to: "xiayun@domain.com"
send_resolved: true
webhook_configs:
- send_resolved: true
url: http://172.16.18.6:8060/dingtalk/webhook1/send
# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签。
inhibit_rules:
- source_match:
alertname: InstanceDown
severity: red
target_match:
severity: yellow
equal: ['instance']
代码语言:javascript复制#rules.yml
groups:
- name: hostStatsAlert
rules:
#####server pod down
- alert: InstanceDown
expr: up{job=~"prometheus"} != 1
for: 1m
labels:
severity: red
warn: high
apps: prometheus
annotations:
summary: "Instance {{$labels.instance}} down"
description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 1 minutes."
- alert: CPULoad5High
expr: node_load5 > 10
for: 1m
labels:
severity: yellow
annotations:
summary: "Instance {{$labels.instance}} CPU load-5m High"
description: "{{$labels.instance}} of job {{$labels.job}} CPU load-5m was greater than 10 for more than 1 minutes (current value: {{ $value }})."
- alert: FilesystemFree
expr: node_filesystem_free_bytes{fstype!~"nsfs|rootfs|selinuxfs|autofs|rpc_pipefs|tmpfs|udev|none|devpts|sysfs|debugfs|fuse.*"} / node_filesystem_size_bytes{fstype!~"nsfs|rootfs|selinuxfs|autofs|rpc_pipefs|tmpfs|udev|none|devpts|sysfs|debugfs|fuse.*"} < 0.05
for: 1m
labels:
severity: yellow
annotations:
summary: "Instance {{$labels.instance}} filesystem bytes was less than 5%"
description: "{{$labels.instance}} of job {{$labels.job}} filesystem bytes usage above 95% (current value: {{ $value }}"
- name: k8s-prom
rules:
- alert: K8sPrometheusDown
expr: up{job=~"prometheus"} != 1
for: 1m
labels:
severity: red
warn: high
apps: prometheus
annotations:
summary: "Prometheus {{$labels.instance}} down"
description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 1 minutes."
- alert: K8sNodeDown
expr: up{job=~"kubernetes-nodes"} != 1
for: 1m
labels:
severity: red
warn: high
apps: node
annotations:
summary: "K8s node {{$labels.instance}} down"
description: "{{$labels.instance}} of job {{$labels.job}} has been down for more than 1 minutes."
安装docker环境
代码语言:javascript复制# 安装依赖包
yum install -y yum-utils device-mapper-persistent-data lvm2
# 添加Docker软件包源
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
# 安装Docker CE
yum install docker-ce -y
# 启动
systemctl start docker
# 开机启动
systemctl enable docker
# 查看Docker信息
docker info
安装docker-compose
代码语言:javascript复制curl -L https://github.com/docker/compose/releases/download/1.23.2/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
chmod x /usr/local/bin/docker-compose
启停
代码语言:javascript复制#在docker-compose.yml目录下执行
docker-compose up -d #启
docker-compose down #停
docker-compose restart #重启
由于prometheus采用remote storage,所以暂时不启动,等下面M3DB部署完成再启动。
七、M3DB 集群部署
M3特性
M3具有作为离散组件提供的多个功能,使其成为大规模时间序列数据的理想平台:
- 分布式时间序列数据库M3DB,它为时间序列数据和反向索引提供可伸缩的存储。
- 辅助进程M3Coordinator,允许M3DB充当Prometheus的长期存储。
- 分布式查询引擎M3Query,其对PromQL和Graphite的原生支持(即将推出M3QL)。
- 聚合层M3Aggregator,作为专用的度量聚合器/下采样器运行,允许以不同的分辨率以各种保留方式存储度量。
为什么选择M3DB
其实在选用M3DB之前,我们有尝试过timescaleDB与InfluxDB,由于timescaleDB依赖PG数据库『不熟悉···』,InfluxDB分片功能收费,考量之下选择了M3DB,M3DB其实刚开源,文档真的很少,相对于其它TSDB,数据压缩比还算不错。
集群部署
M3DB集群管理建立在etcd之上,所以需要一个etcd集群,具体可拜读官网。
环境
172.16.7.170 node1 172.16.7.171 node2 172.16.7.172 node3 172.16.10.12 coordinator
etcd集群部署
yum install etcd -y
#etcd配置文件/etc/etcd/etcd.conf
ETCD_DATA_DIR="/etcd-data"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_NAME="node1" #依次为node2,node3
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://node1:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://node1:2379"
ETCD_INITIAL_CLUSTER="node1=http://node1:2380,node2=http://node2:2380,node3=http://node3:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
依次启动etcd节点systemctl start etcd
M3DB集群部署
mkdir /opt/m3db /etcd-data/m3db/cache -p
cat << EOF >/opt/m3db/m3dbnode.yml
coordinator:
listenAddress:
type: "config"
value: "0.0.0.0:7201" # 交互端口
local:
namespaces:
- namespace: default # 数据要存入的表
type: unaggregated # 数据类型
retention: 720h # 数据保存时间
logging:
level: error
metrics: # coordinator本身的metric
scope:
prefix: "coordinator"
prometheus:
handlerPath: /metrics
listenAddress: 0.0.0.0:7203 # until https://github.com/m3db/m3/issues/682 is resolved
sanitization: prometheus
samplingRate: 1.0
extended: none
limits:
maxComputedDatapoints: 10000
tagOptions:
# Configuration setting for generating metric IDs from tags.
idScheme: quoted # 这个必须
db:
logging:
level: error
metrics:
prometheus:
handlerPath: /metrics
sanitization: prometheus
samplingRate: 1.0
extended: detailed
listenAddress: 0.0.0.0:9000
clusterListenAddress: 0.0.0.0:9001
httpNodeListenAddress: 0.0.0.0:9002
httpClusterListenAddress: 0.0.0.0:9003
debugListenAddress: 0.0.0.0:9004
hostID: #采用此配置文件自定义hostname
resolver: config
value: node1 #hostname为 node1,其余节点依次为node2,node3,node4[coordinator]
client:
writeConsistencyLevel: majority # 写一致性级别
readConsistencyLevel: unstrict_majority
gcPercentage: 100
writeNewSeriesAsync: true
writeNewSeriesLimitPerSecond: 1048576
writeNewSeriesBackoffDuration: 2ms
bootstrap:
bootstrappers: # 启动顺序
- filesystem
- commitlog
- peers
- uninitialized_topology
commitlog:
returnUnfulfilledForCorruptCommitLogFiles: false
cache:
series:
policy: lru
postingsList:
size: 262144
commitlog:
flushMaxBytes: 524288
flushEvery: 1s
queue:
calculationType: fixed
size: 2097152
fs:
filePathPrefix: /etcd-data/m3db # m3dbnode数据目录
config:
service:
env: default_env
zone: embedded
service: m3db # 服务名。可以按照consul中的service进行理解
cacheDir: /etcd-data/m3db/cache
etcdClusters:
- zone: embedded
endpoints:
- node1:2379
- node2:2379
- node3:2379
EOF
依次启动
docker run -d -v /opt/m3db/m3dbnode.yml:/etc/m3dbnode/m3dbnode.yml -v /etcd-data/m3db:/etcd-data/m3db -p 7201:7201 -p 7203:7203 -p 9000:9000 -p 9001:9001 -p 9002:9002 -p 9003:9003 -p 9004:9004 --name m3db quay.io/m3db/m3dbnode:latest
初始化
placement init
代码语言:javascript复制curl -sSf -X POST localhost:7201/api/v1/placement/init -d '
{
"num_shards": 1024,
"replication_factor": 3,
"instances": [
{
"id": "node1",
"isolation_group": "node1",
"zone": "embedded",
"weight": 100,
"endpoint": "172.16.7.170:9000",
"hostname": "172.16.7.170",
"port": 9000
},
{
"id": "node2",
"isolation_group": "node2",
"zone": "embedded",
"weight": 100,
"endpoint": "172.16.7.171:9000",
"hostname": "172.16.7.171",
"port": 9000
},
{
"id": "node3",
"isolation_group": "node3",
"zone": "embedded",
"weight": 100,
"endpoint": "172.16.7.172:9000",
"hostname": "172.16.7.172",
"port": 9000
},
{
"id": "node4",
"isolation_group": "node4",
"zone": "embedded",
"weight": 99,
"endpoint": "172.16.10.12:9000",
"hostname": "172.16.10.12",
"port": 9000
}
]
}'
namespace init
代码语言:javascript复制curl -X POST localhost:7201/api/v1/namespace -d '
{
"name": "default",
"options": {
"bootstrapEnabled": true,
"flushEnabled": true,
"writesToCommitLog": true,
"cleanupEnabled": true,
"snapshotEnabled": true,
"repairEnabled": false,
"retentionOptions": {
"retentionPeriodDuration": "720h",
"blockSizeDuration": "12h",
"bufferFutureDuration": "1h",
"bufferPastDuration": "1h",
"blockDataExpiry": true,
"blockDataExpiryAfterNotAccessPeriodDuration": "5m"
},
"indexOptions": {
"enabled": true,
"blockSizeDuration": "12h"
}
}
}'
etcd的故障容忍程度如图:
本集群可容忍1个节点的故障,2个及以上故障时会引起集群不可用。
八、Prometheus remote Write/Read
外部Prometheus节点启动:docker-compose up -d
Ending 部署完成
想了解prometheus 基于springcloud监控的,可以查看公众号历史文章。