可观测平台-4.3: 数据库告警配置参考

2023-12-14 17:41:14 浏览数 (1)

MySQL监控配置

MySQL日志导出器

要导出MySQL日志,您可以配置MySQL以记录查询、慢查询和与复制相关的信息。您可以使用Filebeat或Fluentd等工具来收集并发送这些日志进行分析。

MySQL指标导出器

对于MySQL指标,您可以使用Prometheus和mysqld_exporter来收集并导出MySQL性能指标到Prometheus。

MySQL服务的Prometheus监控规则(YAML)

以下是MySQL的一些示例Prometheus监控规则:

代码语言:yaml复制
groups:
- name: mysql_metrics
  rules:
  - record: mysql_query_throughput
    expr: rate(mysql_queries_total[5m])

  - record: mysql_query_response_time_seconds
    expr: histogram_quantile(0.95, rate(mysql_query_duration_seconds_bucket[5m]))

  - record: mysql_slow_queries
    expr: rate(mysql_slow_queries_total[5m])

  - record: mysql_cpu_usage_percentage
    expr: 100 - avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]))

  - record: mysql_memory_usage_bytes
    expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes

  - record: mysql_disk_io_operations
    expr: rate(node_disk_io_time_seconds_total[5m])

  - record: mysql_network_io_bytes
    expr: rate(node_network_receive_bytes_total[5m])   rate(node_network_transmit_bytes_total[5m])

MySQL服务的Prometheus警报规则(YAML)

以下是MySQL的一些示例Prometheus警报规则:

代码语言:yaml复制
groups:
- name: mysql_alerts
  rules:
  - alert: HighQueryResponseTime
    expr: mysql_query_response_time_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "MySQL中查询响应时间过高"
      description: "MySQL查询响应时间过高。"

  - alert: HighSlowQueries
    expr: mysql_slow_queries > 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MySQL中慢查询过多"
      description: "MySQL出现大量慢查询。"

  - alert: HighCPUUsage
    expr: mysql_cpu_usage_percentage > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "MySQL中CPU使用率过高"
      description: "MySQL服务器的CPU使用率超过90%。"

  - alert: HighMemoryUsage
    expr: mysql_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MySQL中内存使用过高"
      description: "MySQL使用超过512MB的内存。"

  - alert: HighDiskIOOperations
    expr: mysql_disk_io_operations > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MySQL中磁盘I/O操作过高"
      description: "MySQL执行了大量磁盘I/O操作。"

  - alert: HighNetworkIO
    expr: mysql_network_io_bytes > 10000000  # 10 MB/s
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MySQL中网络I/O过高"
      description: "MySQL的网络I/O超过10MB/s。"

MySQL Grafana仪表板

对于MySQL的Grafana仪表板,您可以查找或创建一个适合您需求的仪表板。这些仪表板通常包括各种MySQL性能指标的图表和可视化,如查询吞吐量、查询响应时间、慢查询、CPU使用率、内存使用率、磁盘I/O、网络I/O、复制延迟等。您可以使用其ID导入现有仪表板,也可以从Grafana仪表板存储库中下载JSON文件,然后根据需要进行自定义和配置。

PostgreSQL监控配置

PostgreSQL日志导出器

要导出PostgreSQL日志,配置PostgreSQL以记录查询、错误和其他相关信息。您可以使用日志传送工具来收集并转发这些日志进行分析。

PostgreSQL指标导出器

对于PostgreSQL指标,使用Prometheus和pg_exporter来收集并导出PostgreSQL性能指标到Prometheus。

PostgreSQL服务的Prometheus监控规则(YAML)

以下是PostgreSQL的一些示例Prometheus监控规则:

代码语言:yaml复制
groups:
- name: postgresql_metrics
  rules:
  - record: postgresql_transaction_throughput
    expr: rate(postgresql_transactions_total[5m])

  - record: postgresql_query_latency_seconds
    expr: histogram_quantile(0.95, rate(postgresql_query_duration_seconds_bucket[5m]))

  - record: postgresql_index_hit_rate
    expr: rate(postgresql_index_hits_total[5m]) / rate(postgresql_index_scan_total[5m]   postgresql_index_hits_total[5m])

  - record: postgresql_cpu_usage_percentage
    expr: 100 - avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]))

  - record: postgresql_memory_usage_bytes
    expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes

  - record: postgresql_disk_space_usage_bytes
    expr: node_filesystem_size_bytes{fstype="ext4"} - node_filesystem_free_bytes{fstype="ext4"}

  - record: postgresql_active_connections
    expr: postgresql_active_connections

  - record: postgresql_idle_connections
    expr: postgresql_idle_connections

  - record: postgresql_lock_wait_time_seconds
    expr: postgresql_lock_wait_time_seconds

PostgreSQL服务的Prometheus警报规则(YAML)

以下是PostgreSQL的一些示例Prometheus警报规则:

代码语言:yaml复制
groups:
- name: postgresql_alerts
  rules:
  - alert: HighQueryLatency
    expr: postgresql_query_latency_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "PostgreSQL中查询延迟过高"
      description: "PostgreSQL查询延迟过高。"

  - alert: LowIndexHitRate
    expr: postgresql_index_hit_rate < 0.9
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL中索引命中率过低"
      description: "PostgreSQL索引命中率低于90%。"

  - alert: HighCPUUsage
    expr: postgresql_cpu_usage_percentage > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "PostgreSQL中CPU使用率过高"
      description: "PostgreSQL服务器的CPU使用率超过90%。"

  - alert: HighMemoryUsage
    expr: postgresql_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL中内存使用过高"
      description: "PostgreSQL使用超过512MB的内存。"

  - alert: LowFreeDiskSpace
    expr: postgresql_disk_space_usage_bytes < 5368709120  # 5 GB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL中可用磁盘空间不足"
      description: "PostgreSQL的可用磁盘空间不足5GB。"

  - alert: HighLockWaitTime
    expr: postgresql_lock_wait_time_seconds > 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL中锁等待时间过高"
      description: "PostgreSQL查询等待锁的时间超过1秒。"

PostgreSQL Grafana仪表板

对于PostgreSQL的Grafana仪表板,您可以查找或创建一个适合您需求的仪表板。这些仪表板通常包括各种PostgreSQL性能指标的图表和可视化,如事务吞吐量、查询延迟、索引命中率、CPU使用率、内存使用率、磁盘空间、连接数和锁等待时间。您可以使用其ID导入现有仪表板,也可以从Grafana仪表板存储库中下载JSON文件,然后根据需要进行自定义和配置。

MongoDB监控配置

MongoDB日志导出器

要导出MongoDB日志,您可以配置MongoDB以记录操作、查询响应时间等信息。然后,可以使用日志收集工具(如Filebeat或Fluentd)来收集和发送这些日志以供分析。

MongoDB指标导出器

对于MongoDB指标,您可以使用Prometheus与MongoDB Exporter来收集和导出MongoDB性能指标到Prometheus。

MongoDB服务的Prometheus监控规则(YAML)

以下是MongoDB的一些示例Prometheus监控规则:

代码语言:yaml复制
groups:
- name: mongodb_metrics
  rules:
  - record: mongodb_operations_throughput
    expr: rate(mongodb_operations_total[5m])

  - record: mongodb_query_response_time_seconds
    expr: histogram_quantile(0.95, rate(mongodb_query_response_time_seconds_bucket[5m]))

  - record: mongodb_memory_usage_bytes
    expr: mongodb_memory_usage_bytes

  - record: mongodb_disk_usage_bytes
    expr: mongodb_disk_usage_bytes

  - record: mongodb_network_traffic_bytes
    expr: rate(mongodb_network_traffic_bytes_total[5m])

  - record: mongodb_replica_set_status
    expr: mongodb_replica_set_status

  - record: mongodb_shard_balance
    expr: mongodb_shard_balance

MongoDB服务的Prometheus告警规则(YAML)

以下是MongoDB的一些示例Prometheus告警规则:

代码语言:yaml复制
groups:
- name: mongodb_alerts
  rules:
  - alert: HighQueryResponseTime
    expr: mongodb_query_response_time_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "MongoDB中查询响应时间过高"
      description: "MongoDB查询响应时间过高。"

  - alert: HighMemoryUsage
    expr: mongodb_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MongoDB中内存使用过高"
      description: "MongoDB使用超过512MB的内存。"

  - alert: HighDiskUsage
    expr: mongodb_disk_usage_bytes > 10737418240  # 10 GB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MongoDB中磁盘使用过高"
      description: "MongoDB使用超过10GB的磁盘空间。"

  - alert: HighNetworkTraffic
    expr: mongodb_network_traffic_bytes > 10000000  # 10 MB/s
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MongoDB中网络流量过高"
      description: "MongoDB的网络流量超过10MB/s。"

  - alert: ReplicaSetNotHealthy
    expr: mongodb_replica_set_status != 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "MongoDB副本集状态异常"
      description: "MongoDB副本集状态异常。"

  - alert: ShardNotBalanced
    expr: mongodb_shard_balance != 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MongoDB分片不平衡"
      description: "MongoDB分片不平衡。"

MongoDB Grafana仪表板

对于MongoDB的Grafana仪表板,您可以查找或创建一个适合您需求的仪表板。这些仪表板通常包括各种MongoDB性能指标的图表和可视化,如操作吞吐量、查询响应时间、内存使用情况、磁盘使用情况、网络流量、副本集状态和分片平衡等。您可以使用其ID导入现有仪表板,也可以从Grafana仪表板存储库中下载JSON文件,然后根据需要进行自定义和配置。

Cassandra监控配置

Cassandra日志导出器

要导出Cassandra日志,您可以配置Cassandra以记录读/写操作延迟、节点状态和数据复制延迟等信息。然后,您可以使用日志收集工具将这些日志收集和发送以供分析。

Cassandra指标导出器

对于Cassandra指标,您可以使用Prometheus与Cassandra Exporter来收集和导出Cassandra性能指标到Prometheus。

Cassandra服务的Prometheus监控规则(YAML)

以下是Cassandra的一些示例Prometheus监控规则:

代码语言:yaml复制
groups:
- name: cassandra_metrics
  rules:
  - record: cassandra_read_latency_seconds
    expr: histogram_quantile(0.95, rate(cassandra_read_latency_seconds_bucket[5m]))

  - record: cassandra_write_latency_seconds
    expr: histogram_quantile(0.95, rate(cassandra_write_latency_seconds_bucket[5m]))

  - record: cassandra_read_throughput
    expr: rate(cassandra_reads_total[5m])

  - record: cassandra_write_throughput
    expr: rate(cassandra_writes_total[5m])

  - record: cassandra_cpu_usage_percentage
    expr: 100 - avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]))

  - record: cassandra_memory_usage_bytes
    expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes

  - record: cassandra_disk_io_operations
    expr: rate(node_disk_io_time_seconds_total[5m])

  - record: cassandra_network_io_bytes
    expr: rate(node_network_receive_bytes_total[5m])   rate(node_network_transmit_bytes_total[5m])

  - record: cassandra_node_status
    expr: cassandra_node_status

  - record: cassandra_replication_delay_seconds
    expr: cassandra_replication_delay_seconds

Cassandra服务的Prometheus告警规则(YAML)

以下是Cassandra的一些示例Prometheus告警规则:

代码语言:yaml复制
groups:
- name: cassandra_alerts
  rules:
  - alert: HighReadLatency
    expr: cassandra_read_latency_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Cassandra中读操作延迟过高"
      description: "Cassandra读操作延迟过高。"

  - alert: HighWriteLatency
    expr: cassandra_write_latency_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Cassandra中写操作延迟过高"
      description: "Cassandra写操作延迟过高。"

  - alert: HighCPUUsage
    expr: cassandra_cpu_usage_percentage > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Cassandra中CPU使用率过高"
      description: "Cassandra服务器的CPU使用率超过90%。"

  - alert: HighMemoryUsage
    expr: cassandra_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Cassandra中内存使用过高"
      description: "Cassandra使用超过512MB的内存。"

  - alert: HighDiskIOOperations
    expr: cassandra_disk_io_operations > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Cassandra中磁盘I/O操作过高"
      description: "Cassandra执行了大量磁盘I/O操作。"

  - alert: HighNetworkIO
    expr: cassandra_network_io_bytes > 10000000  # 10 MB/s
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Cassandra中网络I/O过高"
      description: "Cassandra的网络I/O超过10MB/s。"

  - alert: NodeNotHealthy
    expr: cassandra_node_status != 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Cassandra节点状态异常"
      description: "Cassandra节点状态异常。"

  - alert: ReplicationDelayTooHigh
    expr: cassandra_replication_delay_seconds > 60
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Cassandra数据复制延迟过高"
      description: "Cassandra数据复制延迟超过60秒。"

Cassandra Grafana仪表板

对于Cassandra的Grafana仪表板,您可以查找或创建一个适合您需求的仪表板。这些仪表板通常包括各种Cassandra性能指标的图表和可视化,如读/写操作延迟、CPU使用率、内存使用率、磁盘I/O、网络I/O、节点状态和数据复制延迟等。您可以使用其ID导入现有仪表板,也可以从Grafana仪表板存储库中下载JSON文件,然后根据需要进行自定义和配置。

0 人点赞