参考 https://github.com/prometheus-operator/kube-prometheus#quickstart
部署prometheus
代码语言:javascript复制kubectl create ns monitoring
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
kubectl create -f manifests/setup
until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done
kubectl create -f manifests/
简单几步,我们的prometheus就起来了
代码语言:javascript复制[root@app01 kube-prometheus]# kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 25h
alertmanager-main-1 2/2 Running 0 25h
alertmanager-main-2 2/2 Running 0 25h
grafana-85c89999cb-7wpd8 1/1 Running 0 45h
kube-state-metrics-6b7567c4c7-26ws7 3/3 Running 0 45h
node-exporter-4cw8g 2/2 Running 0 45h
node-exporter-glj7r 2/2 Running 0 45h
prometheus-adapter-b8d458474-8zh5w 1/1 Running 0 45h
prometheus-k8s-0 3/3 Running 1 23h
prometheus-k8s-1 3/3 Running 1 23h
prometheus-operator-56b8d8db89-kfwvz 2/2 Running 0 45h
但是,这个方式部署起来的prometheus还有很多的缺陷,不能直接用于生产环境使用,具体问题点如下:
1、数据没有持久化存储(grafana、prometheus)
2、svc都是用的clusterip 不方便运维管理
3、默认的告警方式不方便
4、加监控target也不方便
后面,我会分成多篇博客来拆解这些问题。
添加监控target
0、修改文件 vim kube-prometheus/manifests/prometheus-prometheus.yaml 增加如下内容:
代码语言:javascript复制 additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
具体的位置上上下文,可以看下图:
代码语言:javascript复制应用变更到k8s生效:
kubectl apply -f kube-prometheus/manifests/prometheus-prometheus.yaml -n monitoring
1、编写需要增加的target到prometheus-additional.yaml 文件里面
代码语言:javascript复制vim prometheus-additional.yaml
- job_name: linuxbase
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- 172.10.0.23:9100
2、创建新的secret
代码语言:javascript复制kubectl create secret generic additional-scrape-configs -n monitoring --from-file=prometheus-additional.yaml --dry-run=client -o yaml > additional-scrape-configs.yaml
3、应用到prometheus
代码语言:javascript复制kubectl apply -f additional-scrape-configs.yaml -n monitoring
4、稍等片刻,查看prometheus的target列表即可,或者我们可以查看 prometheus的 secrets
增加暴露地址端口
我这儿简单演示,用的loadbalancer方式暴露。 更好的方法是通过ingress 域名的方式暴露出来,因为grafana通常还会开给研发人员使用。
lb-grafana.yaml
代码语言:javascript复制apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alicloud-loadbalancer-force-override-listeners: "true"
labels:
app: grafana
name: grafana-lb
namespace: monitoring
spec:
externalTrafficPolicy: Cluster
ports:
- name: http
port: 3000
protocol: TCP
targetPort: 3000
selector:
app: grafana
sessionAffinity: None
type: LoadBalancer
lb-alertmanager.yaml
代码语言:javascript复制apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alicloud-loadbalancer-force-override-listeners: "true"
labels:
alertmanager: main
name: alertmanager-main-lb
namespace: monitoring
spec:
externalTrafficPolicy: Cluster
ports:
- name: web
port: 9093
protocol: TCP
targetPort: 9093
selector:
alertmanager: main
app: alertmanager
type: LoadBalancer
lb-prometheus.yaml
代码语言:javascript复制apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alicloud-loadbalancer-force-override-listeners: "true"
labels:
prometheus: k8s
name: prometheus-k8s-lb
namespace: monitoring
spec:
externalTrafficPolicy: Cluster
ports:
- name: web
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus
prometheus: k8s
type: LoadBalancer
持久化数据存储
grafana和prometheus的数据都需要做持久存储。
grafana持久持久存储是因为可能安装些第三方的插件。
prometheus的持久存储,自不必多说,那是肯定需要的。
我们部署在云上,持久化存储这块,可以很方便的使用他们提供的服务(NFS、OSS、aliyun-disk都可以)。