构建企业级监控平台系列（十三）：Prometheus Server 配置详解

Prometheus Server 启动参数介绍

参数查询

Prometheus命令参数可通过./prometheus -h查询。

参数说明

代码语言：javascript复制

--version #显示版本信息

--confi.file="xxxxxxxxx" #指定配置文件

--web.listen-address=“0.0.0.0:9090” #ui、api、telemetry监听地址信息

--web.config.file="" #TLS或身份验证配置文件

--web.read-timeout=5m #页面读取请求最大超时时间 

--web.max-connections=512 #同时访问Prometheus页面的最大连接数，默认为512

--web.external-url=<URL> #Prometheus对外提供的url(eg: Prometheus通过反向代理提供服务)。用于生成一个相对和绝对的链接返回给Prometheus本身。如果这个url有路径部分，它将用于Prometheus所有HTTP端点的前缀。如果省略了，则相关的url组件将自动派生。

--web.route-prefix=<path> #Web端点内部路由的前缀。默认路径：web.external-url

--web.user-assets=<path> #静态资源路径，可以在/user下找到

--web.enable-lifecycle  #通过HTTP请求启用关闭和重新加载

--web.enable-admin-api  #启用管理控制操作的api端点

--web.console.templates="consoles" #到控制台模板目录的路径，可以在consoles/目录下找到。

--web.console.libraries="console_libraries" #控制台库目录的路径

–web.page-title=”Prometheus时间序列采集和处理服务器” #Prometheus实例的文件标题。

--web.cors.origin=".*" #正则表达式为CORS原点。它已完全锚定。例子:“https ?: / / (domain1 | domain2) 。com”

--storage.tsdb.path="data/" #数据存储的基本路径

--storage.tsdb.min-block-duration=2h #在持久化之前数据块的最短保存期

--storage.tsdb.max-block-duration=<duration> #在持久化之前数据块的最大保存期(默认为保存期的10%)

--storage.tsdb.retention=STORAGE.TSDB.RETENTION 
#样品保存的时间。此标志已被弃用，请使用“storage.tsdb.retention”。时间”。——storage.tsdb.retention。时间= STORAGE.TSDB.RETENTION。保存样品的时间。当设置此标志时，它将覆盖“storage.tsdb.retention”。如果既没有这个标志，也没有“storage.tsdb”。保留”也不“storage.tsdb.retention。设置大小，保留时间默 认为15d。支持单位:y, w, d, h, m, s, ms。

--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME 
#样品保存的时间。此标志已被弃用，请使用“storage.tsdb.retention”。时间”。——storage.tsdb.retention。时间= STORAGE.TSDB.RETENTION。保存样品的时间。当设置此标志时，它将覆盖“storage.tsdb.retention”。如果既没有这个标志，也没有“storage.tsdb”。保留”也不“storage.tsdb.retention。设置大小，保留时间默 认为15d。支持单位:y, w, d, h, m, s, ms。

--storage.tsdb.retention.size= 
#STORAGE.TSDB.RETENTION。大小[实验]块可以存储的最大字节数。需要一个单位，支持单位:B, KB, MB, GB, TB, PB, EB。例:“512 mb”。这个标志是实验性的，可以在以后的版本中更改。——storage.tsdb。不要在数据目录中创建lockfile。

--storage.tsdb.no-lockfile #不在数据目录中创建lockfile。

–storage.tsdb.allow-overlapping-blocks #允许重叠块，从而支持垂直压缩和垂直查询合并。

–storage.tsdb.wal-compression #压缩tsdb WAL。

–storage.remote.flush-deadline= #关闭或重新加载配置时需要等待多长时间刷新样例。

–storage.remote.read-sample-limit=5e7 #在单个查询中通过远程读接口返回的最大样本总数。0意味着没有限制。对于流响应类型，此限制将被忽略。

–storage.remote.read-concurrent-limit=10 #并发远程读调用的最大数量。0意味着没有限制。

–storage.remote.read-max-bytes-in-frame=1048576 #在编组前流化远程读取响应类型的单个帧中的最大字节数。请注意，客户端可能也有帧大小的限制。默认情况下为1MB。

–rules.alert.for-outage-tolerance=1h #忍受Prometheus故障恢复“for”警报状态的最大时间。

--rules.alert.for-grace-period=10m #警报和恢复“for”状态之间的最小持续时间。仅对配置的“for”时间大于宽限期的警报进行维护。
 
–rules.alert.resend-delay=1m  #在向Alertmanager重新发送警报之前等待的最短时。间。

--alertmanager.notification-queue-capacity=10000 #等待报警通知队列的大小。

--alertmanager.timeout=10s #发送警报到Alertmanager的超时时间。

--query.lookback-delta=5m #允许在表达式求值期间检索度量值的delta差值。

--query.timeout=2m #一个查询在终止之前可以执行的最长时间(如果超过2min，就会自动kill掉)。

--query.max-concurrency=20 #并发执行的最大查询数，默认为20。

–query.max-samples=50000000 #单个查询可以加载到内存中的最大样本数。注意，如果查询尝试将比这个更多的样本加载到内存中，那么查询将会失败，因此这也限制了查询可以返回的样本数量。

--enable-feature= #逗号分隔要启用的功能名称，有效选项：agent、examplar-storage、expand-external-labels、memory--snapshot-on-shutdown、promql-at-modifier、promql-negative-offset、remote-write-reciver、extra-scrape-metrics、new-service-discovery-manager，详情请查看https://prometheus.io/docs/prometheus/latest/feature_flags/。

--log.level=info  #开启打印日志级别(debug,info,warn,error,fatal)。默认为info。

--log.format=logfmt #日志消息的输出格式。其中一个:[logfmt, json]。

更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

参数特殊用法

1）未设置参数 --web.enable-lifecycle时，执行curl -X POST http://localhost:9090/-/reload 会报错。

代码语言：javascript复制

# curl -X POST http://localhost:9090/-/reloadLifecycle API is not enabled.

启动设置参数--web.enable-lifecycle就可以用命令 curl -X POST http://localhost:9090/-/reload 重新加载配置文件了。

代码语言：javascript复制

# ./prometheus --config.file  ./prometheus.yml --web.enable-lifecycle# curl -X POST http://localhost:9090/-/reload

2）调整日志级别

启动prometheus时，把参数--log.level=info带上。

代码语言：javascript复制

# ./prometheus --config.file  ./prometheus.yml --web.enable-lifecycle --log.level=info 2> prom.log &

查看日志文件

代码语言：javascript复制

# tail -f prom.log
level=info ts=2023-02-14T15:22:57.725Z caller=head.go:577 component=tsdb msg="WAL segment loaded" segment=43 maxSegment=46
level=info ts=2023-02-14T15:22:57.736Z caller=head.go:577 component=tsdb msg="WAL segment loaded" segment=44 maxSegment=46
level=info ts=2023-02-14T15:22:57.737Z caller=head.go:577 component=tsdb msg="WAL segment loaded" segment=45 maxSegment=46
level=info ts=2023-02-14T15:22:57.737Z caller=head.go:577 component=tsdb msg="WAL segment loaded" segment=46 maxSegment=46
level=info ts=2023-02-14T15:22:57.737Z caller=head.go:583 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=5.246341ms wal_replay_duration=44.623041ms total_replay_duration=50.886206ms
level=info ts=2023-02-14T15:22:57.739Z caller=main.go:849 fs_type=XFS_SUPER_MAGIC
level=info ts=2023-02-14T15:22:57.739Z caller=main.go:852 msg="TSDB started"
level=info ts=2023-02-14T15:22:57.739Z caller=main.go:979 msg="Loading configuration file" filename=./prometheus.yml
level=info ts=2023-02-14T15:22:57.742Z caller=main.go:10

更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

Prometheus Server 配置介绍

概述

Prometheus的配置文件是prometheus.yml，在启动时指定相关的文件，可对配置内容进行加载。

该配置文件分为四个模块：

global：全局配置
alerting：告警配置
rule_files：规则配置
scrape_configs：目标拉取配置

prometheus.yml配置内容如下：

代码语言：javascript复制

global:
  scrape_timeout: 10s
alerting:
  alertmanagers:
  - scheme: http
    timeout: 10s
    static_configs:
    - targets:
        - localhost:9093
rule_files:
  - "cpu_rules.yml"
  - "mem_rules.yml"
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: 'node-exporter'
    static_configs:
    - targets:
      - '192.168.0.3:9100'
  - job_name: 'node'
    scrape_interval: 8s
    static_configs:
      - targets: ['127.0.0.1:9100', '127.0.0.12:9100']

global模块

用于定义prometheus的全局配置。

代码语言：javascript复制

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  scrape_timeout: 10s # is set to the global default (10s).
  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

scrape_interval

字段含义：全局默认的数据拉取间隔。

配置方法

代码语言：javascript复制

scrape_interval: <duration>

默认1m。

scrape_timeout

字段含义：全局默认的单次数据拉取超时，当报context deadline exceeded错误时需要在特定的job下配置该字段。

配置方法

代码语言：javascript复制

scrape_timeout: <duration>

默认10s。

evaluation_interval

字段含义：全局默认的规则(主要是报警规则)拉取间隔。

配置方法

代码语言：javascript复制

evaluation_interval: <duration>

默认1m。

external_labels

字段含义：服务端在与其他系统对接所携带的标签。

配置方法

代码语言：javascript复制

<labelname>: <labelvalue> ...

无默认值，可选参数。更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

alerting模块

用于设置Prometheus与Alertmanager的通信，在Prometheus的整体架构中，Prometheus会根据配置的告警规则触发警报并发送到独立的Alertmanager组件，Alertmanager将对告警进行管理并发送给相关的用户。

通常可以使用运行参数 -alertmanager.xxx 来配置 Alertmanager，但是这样不够灵活，不能做到动态更新加载，以及动态定义告警属性。

所以 alerting 配置主要用来解决这个问题，它能够更好的管理 Alertmanager, 主要包含 2 个参数：

alert_relabel_configs: 动态修改 alert 属性的规则配置。
alertmanagers: 用于动态发现 Alertmanager 的配置。

配置文件结构大概为：

代码语言：javascript复制

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

其中 alertmanagers 为 alertmanager_config 数组。

配置文件结构大概为:

代码语言：javascript复制

# Per-target Alertmanager timeout when pushing alerts.
[ timeout: <duration> | default = 10s ]
# Prefix for the HTTP path alerts are pushed to.
[ path_prefix: <path> | default = / ]
# Configures the protocol scheme used for requests.
[ scheme: <scheme> | default = http ]
# Sets the `Authorization` header on every request with the
# configured username and password.
basic_auth:
  [ username: <string> ]
  [ password: <string> ]
# Sets the `Authorization` header on every request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <string> ]
# Sets the `Authorization` header on every request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]
# Configures the scrape request's TLS settings.
tls_config:
  [ <tls_config> ]
# Optional proxy URL.
[ proxy_url: <string> ]
# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]
# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]
# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]
# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]
# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]
# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]
# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]
# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]
# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]
# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]
# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]
# List of labeled statically configured Alertmanagers.
static_configs:
  [ - <static_config> ... ]
# List of Alertmanager relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]

scheme

字段含义：配置如何访问alertmanager，可使用http或https。

配置方法

代码语言：javascript复制

- scheme: http

timeout

字段含义：配置与alertmanager连接的超时时间

配置方法

代码语言：javascript复制

timeout: 10s

static_configs

字段含义：配置alertmanager的地址信息
targets ：alertmanager服务地址指定，数组类型，可配置多个。

rule_files模块

用于指定告警规则的文件路径，文件格式为yml，支持多个文件以及文件目录。

配置文件结构大致为：

代码语言：javascript复制

rule_files:
  - "cpu_rules.yml"
  - "mem_rules.yml"

Prometheus的告警规则都是通过yml文件进行配置，对于用惯了图形界面的同学来说，可能不会太习惯。但配置化也是Promthesu的特点之一，这种方式提供了开放性的定制化功能，可以根据自己需要进行各类规则的定制化配置。

scrape_configs模块

用于指定Prometheus抓取的目标信息。每一个拉取配置主要包含以下参数：

代码语言：javascript复制

job_name：任务名称
honor_labels： 用于解决拉取数据标签有冲突，当设置为 true, 以拉取数据为准，否则以服务配置为准
params：数据拉取访问时带的请求参数
scrape_interval： 拉取时间间隔
scrape_timeout: 拉取超时时间
metrics_path： 拉取节点的 metric 路径
scheme： 拉取数据访问协议
sample_limit： 存储的数据标签个数限制，如果超过限制，该数据将被忽略，不入存储；默认值为0，表示没有限制
relabel_configs： 拉取数据重置标签配置
metric_relabel_configs：metric 重置标签配置

完整acrape_configs的配置为

代码语言：javascript复制

# The job name assigned to scraped metrics by default.
job_name: <job_name>
# How frequently to scrape targets from this job.
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]
# Per-scrape timeout when scraping this job.
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]
# The HTTP resource path on which to fetch metrics from targets.
[ metrics_path: <path> | default = /metrics ]
# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels. This is useful for use cases such as federation, where all labels
# specified in the target should be preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels: <boolean> | default = false ]
# Configures the protocol scheme used for requests.
[ scheme: <scheme> | default = http ]
# Optional HTTP URL parameters.
params:
  [ <string>: [<string>, ...] ]
# Sets the `Authorization` header on every scrape request with the
# configured username and password.
basic_auth:
  [ username: <string> ]
  [ password: <string> ]
# Sets the `Authorization` header on every scrape request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <string> ]
# Sets the `Authorization` header on every scrape request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]
# Configures the scrape request's TLS settings.
tls_config:
  [ <tls_config> ]
# Optional proxy URL.
[ proxy_url: <string> ]
# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]
# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]
# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]
# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]
# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]
# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]
# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]
# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]
# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]
# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]
# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]
# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]
# List of labeled statically configured targets for this job.
static_configs:
  [ - <static_config> ... ]
# List of target relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]
# List of metric relabel configurations.
metric_relabel_configs:
  [ - <relabel_config> ... ]
# Per-scrape limit on number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabelling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit: <int> | default = 0 ]

案例如下：

代码语言：javascript复制

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: 'node-exporter'
    static_configs:
    - targets:
      - '192.168.0.3:9100'

Prometheus对于监控数据的抓取是通过配置job的方式进行操作。在job里面指定了一组目标抓取所必须的信息，例如目标地址、端口、标签和验证信息等。抓取的时间间隔使用上面global模块配置的时间，也可在该job中单独指定。

在实际环境中，通常会根据抓取目标的类型不同，如Mysql、mongodb、kafka等，分成多个job来进行。

默认配置只有一个监控目标，即prometheus server本身，端口为9090，如果不指定路径，默认会从/metrics路径抓取。更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

远程可写存储

remote_write 主要用于可写远程存储配置，主要包含以下参数：

代码语言：javascript复制

url: 访问地址
remote_timeout: 请求超时时间
write_relabel_configs: 标签重置配置, 拉取到的数据，经过重置处理后，发送给远程存储。

一份完整的配置大致为:

代码语言：javascript复制

# The URL of the endpoint to send samples to.
url: <string>
# Timeout for requests to the remote write endpoint.
[ remote_timeout: <duration> | default = 30s ]
# List of remote write relabel configurations.
write_relabel_configs:
  [ - <relabel_config> ... ]
# Sets the `Authorization` header on every remote write request with the
# configured username and password.
basic_auth:
  [ username: <string> ]
  [ password: <string> ]
# Sets the `Authorization` header on every remote write request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <string> ]
# Sets the `Authorization` header on every remote write request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]
# Configures the remote write request's TLS settings.
tls_config:
  [ <tls_config> ]
# Optional proxy URL.
[ proxy_url: <string> ]

注意： remote_write 属于试验阶段，慎用，因为在以后的版本中可能发生改变。

远程可读存储

remote_read 主要用于可读远程存储配置，主要包含以下参数：

代码语言：javascript复制

url: 访问地址
remote_timeout: 请求超时时间

一份完整的配置大致为:

代码语言：javascript复制

# The URL of the endpoint to query from.
url: <string>
# Timeout for requests to the remote read endpoint.
[ remote_timeout: <duration> | default = 30s ]
# Sets the `Authorization` header on every remote read request with the
# configured username and password.
basic_auth:
  [ username: <string> ]
  [ password: <string> ]
# Sets the `Authorization` header on every remote read request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <string> ]
# Sets the `Authorization` header on every remote read request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]
# Configures the remote read request's TLS settings.
tls_config:
  [ <tls_config> ]
# Optional proxy URL.
[ proxy_url: <string> ]

注意：remote_read 属于试验阶段，慎用，因为在以后的版本中可能发生改变。

服务发现

在 Prometheus 的配置中，一个最重要的概念就是数据源 target，而数据源的配置主要分为静态配置和动态发现, 大致为以下几类：

代码语言：javascript复制

static_configs: 静态服务发现
dns_sd_configs: DNS 服务发现
file_sd_configs: 文件服务发现
consul_sd_configs: Consul 服务发现
serverset_sd_configs: Serverset 服务发现
nerve_sd_configs: Nerve 服务发现
marathon_sd_configs: Marathon 服务发现
kubernetes_sd_configs: Kubernetes 服务发现
gce_sd_configs: GCE 服务发现
ec2_sd_configs: EC2 服务发现
openstack_sd_configs: OpenStack 服务发现
azure_sd_configs: Azure 服务发现
triton_sd_configs: Triton 服务发现

更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

Prometheus Server 数据管理

本文主要介绍Prometheus数据管理方法。

数据存储方式

prometheus采用time-series(时间序列)的方式以一种自定义格式存储在本地硬盘，本地TSDB以每两小时为间隔来分block（块）存储，每一块又分为多个chunk文件，chunk文件存放着采集的T-S数据，metadata和索引文件（index）。index文件是对metrics和labels进行索引，之后存储在chunk，chunk作为存储的基本单位。

prometheus平时将采集的数据先都存在内存中，以类似缓存的方式用于加快搜索和访问，当出现宕机时，prometheus的保护机制WAL可以将数据定期存入硬盘chunk中，并在重新启动时恢复到内存中。

通过启动参数--storage.tsdb.path=“/data”指定tsdb路径，即采集数据存放路径，wal目录下数据相当于对存放在内存中的近期数据的冷备。

Prometheus数据采集方式

prometheus的客户端主要有两种采集方式：

pull主动拉取
push被动推送

pull方式

被监控主机需要安装各类exporters，exporter以守护进程方式运行，在主机系统上采集数据，exporter本身也是一个http_server，可以对http请求作出相应，返回metrics数据（K/V metrics）。prometheus server通过pull方式（HTTP get）去访问每个节点上的exporter并采样监控数据并进行存储。

push方式

客户端或者服务端安装pushgateway插件，使用运维自行开发的各种脚本把监控数据组织成K/V的形式 metrics形式发送给pushgateway，之后pushgateway再推送给prometheus。

数据格式

数据模型

在Prometheus监控中，对于采集到服务端的指标，我们称其为metrics数据。metrics指标为时间序列数据，它们按相同的时序，以时间维度来存储连续数据的集合，其存储结构为K/V类型。

Metrics是一种采样数据的总称（metrics并不代表某一种具体的数据格式，是一种对于度量计算单位的抽象），其有自定义的一套数据格式，不论是日常运维管理还是监控开发，都需要了解掌握其数据结构。

Prometheus 存储的是时序数据, 即按照相同时序(相同的名字和标签)，以时间维度存储连续的数据的集合。

时序索引

时序(time series) 是由名字(Metric)，以及一组 key/value 标签定义的，具有相同的名字以及标签属于相同时序。

时序的名字由 ASCII 字符，数字，下划线，以及冒号组成，它必须满足正则表达式 [a-zA-Z_:][a-zA-Z0-9_:]*, 其名字应该具有语义化，一般表示一个可以度量的指标，例如: http_requests_total, 可以表示 http 请求的总数。

时序的标签可以使 Prometheus 的数据更加丰富，能够区分具体不同的实例，例如 http_requests_total{method="POST"} 可以表示所有 http 中的 POST 请求。

标签名称由 ASCII 字符，数字，以及下划线组成，其中 __ 开头属于 Prometheus 保留，标签的值可以是任何 Unicode 字符，支持中文。

时序样本

按照某个时序以时间维度采集的数据，称之为样本，其值包含：

一个 float64 值
一个毫秒级的 unix 时间戳

数据格式

Prometheus的时间序列统一使用以下格式来表示。

代码语言：javascript复制

<metric name>{<label name>=<label value>, ...}

数据格式案例：

代码语言：javascript复制

http_response_total{method="GET",endpoint="/api/tracks"} 100

http_response_total为指标名称
{}里面为标签，它标明了当前指标样本的特征和维度
数值100则是该样本的具体值。

数据信息可通过curl localhost:9100/metrics获取。

数据类型

metrics数据类型包含：gauges、counts、histograms、Summary。

gauges

最简单的度量指标，只有一个简单的返回值（瞬时状态），例如监控磁盘或者内存的使用量，那么就应该使用gauges格式来度量。Gauge数据类型的特点：

Gauge 常规数值，例如温度变化、内存使用变化。
可变大，可变小。
重启进程后，会被重置

案例：

代码语言：javascript复制

memory_usage_bytes{host="master-01"} 100 < 抓取值
memory_usage_bytes{host="master-01"} 30
memory_usage_bytes{host="master-01"} 50
memory_usage_bytes{host="master-01"} 80 < 抓取值

counts

从数据量为0开始累积计算，在理想状态下只能永远增长，不会降低，例如对用户访问量的采样数据。Counter数据类型的特点：

Counter 用于累计值，例如记录请求次数、任务完成数、错误发生次数。
一直增加，不会减少。
重启进程后，会被重置。

案例：

代码语言：javascript复制

http_response_total{method="GET",endpoint="/api/tracks"} 100
10秒后抓取：
http_response_total{method="GET",endpoint="/api/tracks"} 100

更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

histograms

统计数据的分布情况，比如最小值、最大值、中间值，还有中位数、75百分位、90百分位、95百分位、98百分位、99百分位、99.9百分位的值，代表的是一种近似的百分比估值。例如Http_response_time（http响应时间）代表是一次用户http请求在系统传输和执行过程中总共花费的时间。通过histogram类型可以分别统计出全部用户的响应时间中=0.05秒的有多少、00.05秒的有多少、>2秒的有多少。

Summary

通过计算分位数(quantile)显示指标结果，可用于统计一段时间内数据采样结果，如中位数(quantile=0.5）、9分位数（quantile=0.9）等。

summary与histogram类型选择

Summary 结构有频繁的全局锁操作，对高并发程序性能存在一定影响。histogram仅仅是给每个桶做一个原子变量的计数就可以了，而summary要每次执行算法计算出最新的X分位value是多少，算法需要并发保护。会占用客户端的cpu和内存。
不能对Summary产生的quantile值进行aggregation运算（例如sum, avg等）。例如有两个实例同时运行，都对外提供服务，分别统计各自的响应时间。最后分别计算出的0.5-quantile的值为60和80，这时如果简单的求平均(60 80)/2，认为是总体的0.5-quantile值，那么就错了。
summary的百分位是提前在客户端里指定的，在服务端观测指标数据时不能获取未指定的分为数。而histogram则可以通过promql随便指定，虽然计算的不如summary准确，但带来了灵活性。
histogram不能得到精确的分为数，设置的bucket不合理的话，误差会非常大。会消耗服务端的计算资源。

对比得到的结论是是：

如果需要聚合（aggregate），选择histograms。
如果比较清楚要观测的指标的范围和分布情况，选择histograms。如果需要精确的分为数选择summary。

存储目录结构

存储原理

prometheus按照block块的方式来存储数据，每2小时为一个时间单位，首先会存储到内存中，当到达2小时后，会自动写入磁盘中。为防止程序异常而导致数据丢失，采用了WAL机制，即2小时内记录的数据存储在内存中的同时，还会记录一份日志，存储在block下的wal目录中。当程序再次启动时，会将wal目录中的数据写入对应的block中，从而达到恢复数据的效果。当删除数据时，删除条目会记录在tombstones 中，而不是立刻删除。

prometheus数据目录中的每个block都是一个独立的数据库，采用的存储方式为被称"时间分片。其优势是可以提高查询效率，查哪个时间段的数据，只需要打开对应的block即可，无需打开多余数据。

数据存储目录

代码语言：javascript复制

./data
├── 01BKGV7JBM69T2G1BGBGM6KB12    
│   └── meta.json
├── 01BKGTZQ1SYQJTR4PB43C8PD98  
│   ├── chunks        
│   │   └── 000001     
│   ├── tombstones     
│   ├── index         
│   └── meta.json      
├── chunks_head      
│   └── 000001        
└── wal                # 预写日志
    ├── 000000002     
    └── checkpoint.00000001
        └── 00000000

01BKGV7JBM69T2G1BGBGM6KB12：block ID，这类命名的目录是一个完整的block。
meta.json：是这个block的元信息。
chunks：目录下存储每一个Block中的所有的Chunk，目录下每个文件都是一个chunk数据单元。
Index：文件是该Chunk的索引文件。
tombstones：数据删除记录文件，记录的是删除信息。
wal：保存了内存里最近2小时的数据，用于重启后恢复最近两小时里内存的数
chunks_head：磁盘内存映射头块
checkpoint：checkpoint机制会将wal 清理过后的数据做过滤写成新的段，然后checkpoint文件被命名为创建 checkpoint的最后一个段号checkpoint.X。

文件格式解析

meta.json

代码语言：javascript复制

{
    "ulid": "01EM6Q6A1YPX4G9TEB20J22B2R",
    "minTime": 1602237600000,
    "maxTime": 1602244800000,
    "stats": {
        "numSamples": 553673232,
        "numSeries": 1346066,
        "numChunks": 4440437
    },
    "compaction": {
        "level": 1,
        "sources": [
            "01EM65SHSX4VARXBBHBF0M0FDS",
            "01EM6GAJSYWSQQRDY782EA5ZPN"
        ]
    },
    "version": 1
}

内容解析：

version：告诉我们如何解析元文件。
minTime，maxTime：是块中存在的所有块中的绝对最小和最大时间戳。
stats：告诉块中存在的Series、Samples和Chunks的数量。
compaction：讲述区块的历史。
- sources：告诉这个块是从哪些块创建的（即合并形成这个块的块）。如果它是从 Head 块创建的，则sources设置为自身（01EM6Q6A1YPX4G9TEB20J22B2R在这种情况下）。
- level：告诉这个块已经到了多少代。

chunks_head/chunks

文件的最大大小保持在 128MiB。

代码语言：javascript复制

┌──────────────────────────────┐
│  magic(0x0130BC91) <4 byte>  │
├──────────────────────────────┤
│    version(1) <1 byte>       │
├──────────────────────────────┤
│    padding(0) <3 byte>       │
├──────────────────────────────┤
│ ┌──────────────────────────┐ │
│ │         Chunk 1          │ │
│ ├──────────────────────────┤ │
│ │          ...             │ │
│ ├──────────────────────────┤ │
│ │         Chunk N          │ │
│ └──────────────────────────┘ │
└──────────────────────────────┘

magic：将此文件标识为块文
version：告诉我们如何解析这个文件
padding：适用于任何未来的标题
Chunk 1 - Chunk N：是块列表

单个块的格式：

代码语言：javascript复制

┌─────────────────────┬───────────────────────┬───────────────────────┬───────────────────┬───────────────┬──────────────┬────────────────┐
| series ref <8 byte> | mint <8 byte, uint64> | maxt <8 byte, uint64> | encoding <1 byte> | len <uvarint> | data <bytes> │ CRC32 <4 byte> │
└─────────────────────┴───────────────────────┴───────────────────────┴───────────────────┴───────────────┴──────────────┴────────────────┘

series ref：它是用于访问内存中series的series ID
mint和maxt：块的样本中看到的最小和最大时间戳
encoding：是用于压缩块的编码
len：是从这里开始的字节数data，是压缩块的实际字节数。
CRC32：是上述chunk内容的校验和，用于校验数据的完整性。

数据块如何被读取？series ref 为 8 个字节。前4个字节告诉文件块所在的文件号，最后4个字节告诉文件中块开始的偏移量，如果块在文件中00093并且series ref从文件中的字节偏移开始1234：那么该块的引用将是(93 << 32) | 1234（左移位，然后按位或）。

block/chunks

该chunks目录包含一系列编号的文件，每个文件的上限为 512MiB。文件格式。单个 chunk 的时间跨度默认是 2 小时，Prometheus 后台会有合并操作，把时间相邻的 block 合到一起。

代码语言：javascript复制

┌──────────────────────────────┐
│  magic(0x85BD40DD) <4 byte>  │
├──────────────────────────────┤
│    version(1) <1 byte>       │
├──────────────────────────────┤
│    padding(0) <3 byte>       │
├──────────────────────────────┤
│ ┌──────────────────────────┐ │
│ │         Chunk 1          │ │
│ ├──────────────────────────┤ │
│ │          ...             │ │
│ ├──────────────────────────┤ │
│ │         Chunk N          │ │
│ └──────────────────────────┘ │
└──────────────────────────────┘

magic：将此文件标识为块文件
version：告诉我们如何解析这个文件
padding：适用于任何未来的标题
Chunk 1 - Chunk N：是块列表

单个块的格式：

代码语言：javascript复制

┌───────────────┬───────────────────┬──────────────┬────────────────┐
│ len <uvarint> │ encoding <1 byte> │ data <bytes> │ CRC32 <4 byte> │
└───────────────┴───────────────────┴──────────────┴────────────────┘

作用：这里面存的是时序数据，文件中的块由 uint64 从index文件中引用，uint64 由文件内偏移量（低 4 个字节）和段序列号（高 4 个字节）组成。即：index中的数据条目有一个64bit的引用记录，其中四个字节存数据在哪个文件（段文件序列号），另外四个字节存文件内偏移量，这样就能找到每个记录对应的chunk数据在哪个文件的哪个位置。

这里的chunk和上面的head chuank相比，少了series ref, mint和maxt，为什么不需要呢？因为series ref, mint和maxt的信息在index文件里面有，我们就是根据index文件里的series ref, mint和maxt来查找chunk里的数据的，因此这里不需要存储。更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

index：

代码语言：javascript复制

┌────────────────────────────┬─────────────────────┐
│ magic(0xBAAAD700) <4b>     │ version(1) <1 byte> │
├────────────────────────────┴─────────────────────┤
│ ┌──────────────────────────────────────────────┐ │
│ │                 Symbol Table                 │ │
│ ├──────────────────────────────────────────────┤ │
│ │                    Series                    │ │
│ ├──────────────────────────────────────────────┤ │
│ │                 Label Index 1                │ │
│ ├──────────────────────────────────────────────┤ │
│ │                      ...                     │ │
│ ├──────────────────────────────────────────────┤ │
│ │                 Label Index N                │ │
│ ├──────────────────────────────────────────────┤ │
│ │                   Postings 1                 │ │
│ ├──────────────────────────────────────────────┤ │
│ │                      ...                     │ │
│ ├──────────────────────────────────────────────┤ │
│ │                   Postings N                 │ │
│ ├──────────────────────────────────────────────┤ │
│ │               Label Offset Table             │ │
│ ├──────────────────────────────────────────────┤ │
│ │             Postings Offset Table            │ │
│ ├──────────────────────────────────────────────┤ │
│ │                      TOC                     │ │
│ └──────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘

内容解析：

magic：编号将该文件标识为索引文件
version：告诉我们如何解析这个文件
TOC：该索引的入口点，它代表索引目录。

┌─────────────────────────────────────────┐
│ ref(symbols) <8b>                       │ -> Symbol Table
├─────────────────────────────────────────┤
│ ref(series) <8b>                        │ -> Series
├─────────────────────────────────────────┤
│ ref(label indices start) <8b>           │ -> Label Index 1
├─────────────────────────────────────────┤
│ ref(label offset table) <8b>            │ -> Label Offset Table
├─────────────────────────────────────────┤
│ ref(postings start) <8b>                │ -> Postings 1
├─────────────────────────────────────────┤
│ ref(postings offset table) <8b>         │ -> Postings Offset Table
├─────────────────────────────────────────┤
│ CRC32 <4b>                              │
└─────────────────────────────────────────┘

作用解析：toc数据告诉了索引的各个组成部分到底从哪里开始（文件中的字节偏移量），上面的索引格式中已经标记了每个参考指向的内容。下一个组件的起点也显示了各个组件的终点在哪里。由于TOC是固定大小，因此文件的最后 52 个字节可以作为TOC。

Symbol Table

此部分包含已删除重复字符串的排序列表，这些字符串可在此块中所有系列的标签对中找到。例如，如果系列是{a="y", x="b"}，那么符号就是"a", "b", "x", "y"。

代码语言：javascript复制

┌────────────────────┬─────────────────────┐
│ len <4b>           │ #symbols <4b>       │
├────────────────────┴─────────────────────┤
│ ┌──────────────────────┬───────────────┐ │
│ │ len(str_1) <uvarint> │ str_1 <bytes> │ │
│ ├──────────────────────┴───────────────┤ │
│ │                . . .                 │ │
│ ├──────────────────────┬───────────────┤ │
│ │ len(str_n) <uvarint> │ str_n <bytes> │ │
│ └──────────────────────┴───────────────┘ │
├──────────────────────────────────────────┤
│ CRC32 <4b>                               │
└──────────────────────────────────────────┘

len：这部分的占用字节数
symbols：这部分存储的符号数
len(str_n) <uvarint> │ str_n <bytes> ：是一个符号的长度和内容

作用：索引中的其他部分可以为任何字符串引用此符号表，从而显着减小索引大小。符号在文件中开始的字节偏移量（即的开头len(str_i)）形成了相应符号的引用，该符号可以在其他地方使用，而不是实际的字符串。当需要实际字符串时，可以使用偏移量从该表中获取它。

Series

代码语言：javascript复制

┌───────────────────────────────────────┐
│ ┌───────────────────────────────────┐ │
│ │   series_1                        │ │
│ ├───────────────────────────────────┤ │
│ │                 . . .             │ │
│ ├───────────────────────────────────┤ │
│ │   series_n                        │ │
│ └───────────────────────────────────┘ │
└───────────────────────────────────────┘

每个系列条目都是 16 字节对齐的，这意味着系列开始的字节偏移量可以被 16 整除。因此，我们将系列的 ID 设置为offset/16偏移量指向系列条目开始的位置。此 ID 用于引用该系列，并且每当您想要访问该系列时，您都可以通过执行获取索引中的位置ID*16。

每个series条目都包含系列的标签集和对属于该系列的所有块的引用：

代码语言：javascript复制

┌──────────────────────────────────────────────────────┐
│ len <uvarint>                                        │
├──────────────────────────────────────────────────────┤
│ ┌──────────────────────────────────────────────────┐ │
│ │            labels count <uvarint64>              │ │
│ ├──────────────────────────────────────────────────┤ │
│ │  ┌────────────────────────────────────────────┐  │ │
│ │  │ ref(l_i.name) <uvarint32>                  │  │ │
│ │  ├────────────────────────────────────────────┤  │ │
│ │  │ ref(l_i.value) <uvarint32>                 │  │ │
│ │  └────────────────────────────────────────────┘  │ │
│ │                       ...                        │ │
│ ├──────────────────────────────────────────────────┤ │
│ │            chunks count <uvarint64>              │ │
│ ├──────────────────────────────────────────────────┤ │
│ │  ┌────────────────────────────────────────────┐  │ │
│ │  │ c_0.mint <varint64>                        │  │ │
│ │  ├────────────────────────────────────────────┤  │ │
│ │  │ c_0.maxt - c_0.mint <uvarint64>            │  │ │
│ │  ├────────────────────────────────────────────┤  │ │
│ │  │ ref(c_0.data) <uvarint64>                  │  │ │
│ │  └────────────────────────────────────────────┘  │ │
│ │  ┌────────────────────────────────────────────┐  │ │
│ │  │ c_i.mint - c_i-1.maxt <uvarint64>          │  │ │
│ │  ├────────────────────────────────────────────┤  │ │
│ │  │ c_i.maxt - c_i.mint <uvarint64>            │  │ │
│ │  ├────────────────────────────────────────────┤  │ │
│ │  │ ref(c_i.data) - ref(c_i-1.data) <varint64> │  │ │
│ │  └────────────────────────────────────────────┘  │ │
│ │                       ...                        │ │
│ └──────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────┤
│ CRC32 <4b>                                           │
└──────────────────────────────────────────────────────┘

含义解析：

labels count：这个series数据里有多少个label对。
ref(l_i.name) 和ref(l_i.value) ：不存储实际的字符串本身，而是使用符号表中的符号引用，利用这个引用去查符号表就可以。
chunks count：这个series对应的时序数据由多少个chunk块来存放
mint，maxt，ref：这三个就是前面说的chunk相比head chunk少的三个数据就是存放在这里，在查询series的时候，会根据index中这个series的chunk列表中每个chunk的mint，maxt，ref，然后到chunk文件去查。这里ref是八个字节，里面四个字节记录了数据在哪个chunk文件，四个字节记录了文件在那个chunk文件里的偏移量。

在索引中保存mintandmaxt允许查询跳过查询时间范围不需要的块，上面在记录mint，maxt的时候，你可以看到除了第一个mint记录的是完整的时间戳，后面的其他mint，maxt记录的全是相对上一个数据的时间增量，以节省记录的空间，即第一个mint是varint，后面的全是uvarint，因为增量肯定是正数，使用uvarint，可以节省很多前缀0。

Label Offset Table和Label Index i

这两个不再使用了；它们是为向后兼容而编写的，但不会从最新的 Prometheus 版本中读取Postings Offset Table和Postings i，Postings 1- N存储了Postings列表，Postings Offset Table记录这些条目的偏移量。Postings是一个series ID，在index文件的上下文中，它是series条目在文件中开始的偏移量除16，因为它是 16 字节对齐的。

一个Postings的结构：

代码语言：javascript复制

┌────────────────────┬────────────────────┐
│ len <4b>           │ #entries <4b>      │
├────────────────────┴────────────────────┤
│ ┌─────────────────────────────────────┐ │
│ │ ref(series_1) <4b>                  │ │
│ ├─────────────────────────────────────┤ │
│ │ ...                                 │ │
│ ├─────────────────────────────────────┤ │
│ │ ref(series_n) <4b>                  │ │
│ └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ CRC32 <4b>                              │
└─────────────────────────────────────────┘

entries是下面series列表的数量，ref(series_1) 是series ID，也是series ref，也就是引用。具体Postings如何与Postings Offset Table一起配合记录。

代码语言：javascript复制

// Postings Offset Table
┌─────────────────────┬──────────────────────┐
│ len <4b>            │ #entries <4b>        │
├─────────────────────┴──────────────────────┤
│ ┌────────────────────────────────────────┐ │
│ │  n = 2 <1b>                            │ │
│ ├──────────────────────┬─────────────────┤ │
│ │ len(name) <uvarint>  │ name <bytes>    │ │
│ ├──────────────────────┼─────────────────┤ │
│ │ len(value) <uvarint> │ value <bytes>   │ │
│ ├──────────────────────┴─────────────────┤ │
│ │  offset <uvarint64>                    │ │
│ └────────────────────────────────────────┘ │
│                    . . .                   │
├────────────────────────────────────────────┤
│  CRC32 <4b>                                │
└────────────────────────────────────────────┘

更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

Postings Offset Table作用

存储label-pair对的名字和值以及offset，即label-pair对的posting位置，可以通过offset查询label-pari对在哪个posting中。posting中记录了label-pari对对应的series的内容，可通过series里面的ref查询chunk中的时序数据，互相对应。

作业与实例

Prometheus 中，将任意一个独立的数据源（target）称之为实例（instance）。包含相同类型的实例的集合称之为作业（job）。如下是一个含有四个重复实例的作业：

代码语言：javascript复制

- job: api-server
    - instance 1: 1.2.3.4:5670
    - instance 2: 1.2.3.4:5671
    - instance 3: 5.6.7.8:5670
    - instance 4: 5.6.7.8:5671

自生成标签和时序

Prometheus 在采集数据的同时，会自动在时序的基础上添加标签，作为数据源（target）的标识，以便区分：

代码语言：javascript复制

job: The configured job name that the target belongs to.
instance: The <host>:<port> part of the target's URL that was scraped.

如果其中任一标签已经在此前采集的数据中存在，那么将会根据 honor_labels设置选项来决定新标签。

对每一个实例而言，Prometheus 按照以下时序来存储所采集的数据样本：

代码语言：javascript复制

up{job="<job-name>", instance="<instance-id>"}: 1 表示该实例正常工作
up{job="<job-name>", instance="<instance-id>"}: 0 表示该实例故障
 
scrape_duration_seconds{job="<job-name>", instance="<instance-id>"} 表示拉取数据的时间间隔
 
scrape_samples_post_metric_relabeling{job="<job-name>", instance="<instance-id>"} 表示采用重定义标签（relabeling）操作后仍然剩余的样本数
 
scrape_samples_scraped{job="<job-name>", instance="<instance-id>"} 表示从该数据源获取的样本数

其中 up 时序可以有效应用于监控该实例是否正常工作。更多关于企业级监控平台系列的学习文章，请参阅：构建企业级监控平台，本系列持续更新中。

链接：https://blog.csdn.net/ygq13572549874/article /details/129034316 https://alden.blog.csdn.net/article/ details/129034909 https://alden.blog.csdn.net/article/ details/129037380

prometheus server 监控配置数据

0 人点赞