Observable Platform 4.3: Database Alert Configuration Reference

2023-12-14 17:45:03 浏览数 (1)

MySQL Monitoring Configuration

MySQL Log Exporter

To export MySQL logs, you can configure MySQL to log queries, slow queries, and replication-related information. Tools like Filebeat or Fluentd can be used to collect and ship these logs for analysis.

MySQL Metrics Exporter

For MySQL metrics, you can use Prometheus with the mysqld_exporter to collect and export MySQL performance metrics to Prometheus.

MySQL Service Prometheus Monitoring Rules (YAML)

Here are some sample Prometheus monitoring rules for MySQL:

代码语言:yaml复制
groups:
- name: mysql_metrics
  rules:
  - record: mysql_query_throughput
    expr: rate(mysql_queries_total[5m])

  - record: mysql_query_response_time_seconds
    expr: histogram_quantile(0.95, rate(mysql_query_duration_seconds_bucket[5m]))

  - record: mysql_slow_queries
    expr: rate(mysql_slow_queries_total[5m])

  - record: mysql_cpu_usage_percentage
    expr: 100 - avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]))

  - record: mysql_memory_usage_bytes
    expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes

  - record: mysql_disk_io_operations
    expr: rate(node_disk_io_time_seconds_total[5m])

  - record: mysql_network_io_bytes
    expr: rate(node_network_receive_bytes_total[5m])   rate(node_network_transmit_bytes_total[5m])

MySQL Service Prometheus Alert Rules (YAML)

Here are some example Prometheus alert rules for MySQL:

代码语言:yaml复制
groups:
- name: mysql_alerts
  rules:
  - alert: HighQueryResponseTime
    expr: mysql_query_response_time_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High Query Response Time in MySQL"
      description: "MySQL queries are experiencing high response time."

  - alert: HighSlowQueries
    expr: mysql_slow_queries > 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Slow Queries in MySQL"
      description: "MySQL is experiencing a high number of slow queries."

  - alert: HighCPUUsage
    expr: mysql_cpu_usage_percentage > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High CPU Usage in MySQL"
      description: "MySQL server CPU usage is above 90%."

  - alert: HighMemoryUsage
    expr: mysql_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Memory Usage in MySQL"
      description: "MySQL is using more than 512 MB of memory."

  - alert: HighDiskIOOperations
    expr: mysql_disk_io_operations > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Disk I/O Operations in MySQL"
      description: "MySQL is performing a high number of disk I/O operations."

  - alert: HighNetworkIO
    expr: mysql_network_io_bytes > 10000000  # 10 MB/s
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Network I/O in MySQL"
      description: "MySQL is using more than 10 MB/s of network I/O."

MySQL Grafana Dashboard

For MySQL Grafana dashboard, you can find or create one that suits your needs. These dashboards typically include charts and visualizations for various MySQL performance metrics, such as query throughput, query response time, slow queries, CPU usage, memory usage, disk I/O, network I/O, replication lag, and more. You can import an existing dashboard using its ID or download a JSON file from the Grafana dashboard repository, and then customize and configure it according to your requirements.

PostgreSQL Monitoring Configuration

PostgreSQL Log Exporter

To export PostgreSQL logs, configure PostgreSQL to log queries, errors, and other relevant information. You can use log shipping tools to collect and forward these logs for analysis.

PostgreSQL Metrics Exporter

For PostgreSQL metrics, use Prometheus with the pg_exporter to collect and export PostgreSQL performance metrics to Prometheus.

PostgreSQL Service Prometheus Monitoring Rules (YAML)

Here are some sample Prometheus monitoring rules for PostgreSQL:

代码语言:yaml复制
groups:
- name: postgresql_metrics
  rules:
  - record: postgresql_transaction_throughput
    expr: rate(postgresql_transactions_total[5m])

  - record: postgresql_query_latency_seconds
    expr: histogram_quantile(0.95, rate(postgresql_query_duration_seconds_bucket[5m]))

  - record: postgresql_index_hit_rate
    expr: rate(postgresql_index_hits_total[5m]) / rate(postgresql_index_scan_total[5m]   postgresql_index_hits_total[5m])

  - record: postgresql_cpu_usage_percentage
    expr: 100 - avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]))

  - record: postgresql_memory_usage_bytes
    expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes

  - record: postgresql_disk_space_usage_bytes
    expr: node_filesystem_size_bytes{fstype="ext4"} - node_filesystem_free_bytes{fstype="ext4"}

  - record: postgresql_active_connections
    expr: postgresql_active_connections

  - record: postgresql_idle_connections
    expr: postgresql_idle_connections

  - record: postgresql_lock_wait_time_seconds
    expr: postgresql_lock_wait_time_seconds

PostgreSQL Service Prometheus Alert Rules (YAML)

Here are some example Prometheus alert rules for PostgreSQL:

代码语言:yaml复制
groups:
- name: postgresql_alerts
  rules:
  - alert: HighQueryLatency
    expr: postgresql_query_latency_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High Query Latency in PostgreSQL"
      description: "PostgreSQL queries are experiencing high latency."

  - alert: LowIndexHitRate
    expr: postgresql_index_hit_rate < 0.9
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Low Index Hit Rate in PostgreSQL"
      description: "PostgreSQL index hit rate is below 90%."

  - alert: HighCPUUsage
    expr: postgresql_cpu_usage_percentage > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High CPU Usage in PostgreSQL"
      description: "PostgreSQL server CPU usage is above 90%."

  - alert: HighMemoryUsage
    expr: postgresql_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Memory Usage in PostgreSQL"
      description: "PostgreSQL is using more than 512 MB of memory."

  - alert: LowFreeDiskSpace
    expr: postgresql_disk_space_usage_bytes < 5368709120  # 5 GB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Low Free Disk Space in PostgreSQL"
      description: "PostgreSQL has less than 5 GB of free disk space."

  - alert: HighLockWaitTime
    expr: postgresql_lock_wait_time_seconds > 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Lock Wait Time in PostgreSQL"
      description: "PostgreSQL queries are waiting for locks for more than 1 second."

PostgreSQL Grafana Dashboard

For PostgreSQL Grafana dashboard, you can find or create one that suits your needs. These dashboards typically include charts and visualizations for various PostgreSQL performance metrics, such as transaction throughput, query latency, index hit rate, CPU usage, memory usage, disk space, connection counts, and lock wait times. You can import an existing dashboard using its ID or download a JSON file from the Grafana dashboard repository, and then customize and configure it according to your requirements.

MongoDB Monitoring Configuration

MongoDB Log Exporter

To export MongoDB logs, you can configure MongoDB to log operations, query response times, and other information. Then, you can use log collection tools like Filebeat or Fluentd to collect and send these logs for analysis.

MongoDB Metrics Exporter

For MongoDB metrics, you can use Prometheus with the MongoDB Exporter to collect and export MongoDB performance metrics to Prometheus.

Prometheus Monitoring Rules for MongoDB (YAML)

Here are some sample Prometheus monitoring rules for MongoDB:

代码语言:yaml复制
groups:
- name: mongodb_metrics
  rules:
  - record: mongodb_operations_throughput
    expr: rate(mongodb_operations_total[5m])

  - record: mongodb_query_response_time_seconds
    expr: histogram_quantile(0.95, rate(mongodb_query_response_time_seconds_bucket[5m]))

  - record: mongodb_memory_usage_bytes
    expr: mongodb_memory_usage_bytes

  - record: mongodb_disk_usage_bytes
    expr: mongodb_disk_usage_bytes

  - record: mongodb_network_traffic_bytes
    expr: rate(mongodb_network_traffic_bytes_total[5m])

  - record: mongodb_replica_set_status
    expr: mongodb_replica_set_status

  - record: mongodb_shard_balance
    expr: mongodb_shard_balance

Prometheus Alert Rules for MongoDB (YAML)

Here are some example Prometheus alert rules for MongoDB:

代码语言:yaml复制
groups:
- name: mongodb_alerts
  rules:
  - alert: HighQueryResponseTime
    expr: mongodb_query_response_time_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High Query Response Time in MongoDB"
      description: "MongoDB queries are experiencing high response time."

  - alert: HighMemoryUsage
    expr: mongodb_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Memory Usage in MongoDB"
      description: "MongoDB is using more than 512 MB of memory."

  - alert: HighDiskUsage
    expr: mongodb_disk_usage_bytes > 10737418240  # 10 GB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Disk Usage in MongoDB"
      description: "MongoDB is using more than 10 GB of disk space."

  - alert: HighNetworkTraffic
    expr: mongodb_network_traffic_bytes > 10000000  # 10 MB/s
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Network Traffic in MongoDB"
      description: "MongoDB is generating more than 10 MB/s of network traffic."

  - alert: ReplicaSetNotHealthy
    expr: mongodb_replica_set_status != 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "MongoDB Replica Set Not Healthy"
      description: "MongoDB replica set status is not healthy."

  - alert: ShardNotBalanced
    expr: mongodb_shard_balance != 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "MongoDB Shard Not Balanced"
      description: "MongoDB shard balance is not optimal."

MongoDB Grafana Dashboard

For MongoDB Grafana dashboard, you can find or create one that suits your needs. These dashboards typically include charts and visualizations for various MongoDB performance metrics, such as operation throughput, query response time, memory usage, disk usage, network traffic, replica set status, and shard balance. You can import an existing dashboard using its ID or download a JSON file from the Grafana dashboard repository, and then customize and configure it according to your requirements.

Cassandra Monitoring Configuration

Cassandra Log Exporter

To export Cassandra logs, you can configure Cassandra to log read/write operation latency, node status, and data replication latency, among other information. Then, you can use log collection tools to collect and send these logs for analysis.

Cassandra Metrics Exporter

For Cassandra metrics, you can use Prometheus with the Cassandra Exporter to collect and export Cassandra performance metrics to Prometheus.

Prometheus Monitoring Rules for Cassandra (YAML)

Here are some sample Prometheus monitoring rules for Cassandra:

代码语言:yaml复制
groups:
- name: cassandra_metrics
  rules:
  - record: cassandra_read_latency_seconds
    expr: histogram_quantile(0.95, rate(cassandra_read_latency_seconds_bucket[5m]))

  - record: cassandra_write_latency_seconds
    expr: histogram_quantile(0.95, rate(cassandra_write_latency_seconds_bucket[5m]))

  - record: cassandra_read_throughput
    expr: rate(cassandra_reads_total[5m])

  - record: cassandra_write_throughput
    expr: rate(cassandra_writes_total[5m])

  - record: cassandra_cpu_usage_percentage
    expr: 100 - avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m]))

  - record: cassandra_memory_usage_bytes
    expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes

  - record: cassandra_disk_io_operations
    expr: rate(node_disk_io_time_seconds_total[5m])

  - record: cassandra_network_io_bytes
    expr: rate(node_network_receive_bytes_total[5m])   rate(node_network_transmit_bytes_total[5m])

  - record: cassandra_node_status
    expr: cassandra_node_status

  - record: cassandra_replication_delay_seconds
    expr: cassandra_replication_delay_seconds

Prometheus Alert Rules for Cassandra (YAML)

Here are some example Prometheus alert rules for Cassandra:

代码语言:yaml复制
groups:
- name: cassandra_alerts
  rules:
  - alert: HighReadLatency
    expr: cassandra_read_latency_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High Read Latency in Cassandra"
      description: "Cassandra read operations are experiencing high latency."

  - alert: HighWriteLatency
    expr: cassandra_write_latency_seconds > 0.5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High Write Latency in Cassandra"
      description: "Cassandra write operations are experiencing high latency."

  - alert: HighCPUUsage
    expr: cassandra_cpu_usage_percentage > 90
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High CPU Usage in Cassandra"
      description: "Cassandra server CPU usage is above 90%."

  - alert: HighMemoryUsage
    expr: cassandra_memory_usage_bytes > 536870912  # 512 MB
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Memory Usage in Cassandra"
      description: "Cassandra is using more than 512 MB of memory."

  - alert: HighDiskIOOperations
    expr: cassandra_disk_io_operations > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Disk I/O Operations in Cassandra"
      description: "Cassandra is performing a high number of disk I/O operations."

  - alert: HighNetworkIO
    expr: cassandra_network_io_bytes > 10000000  # 10 MB/s
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Network I/O in Cassandra"
      description: "Cassandra is using more than 10 MB/s of network I/O."

  - alert: NodeNotHealthy
    expr: cassandra_node_status != 1
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Cassandra Node Not Healthy"
      description: "Cassandra node status is not healthy."

  - alert: ReplicationDelayTooHigh
    expr: cassandra_replication_delay_seconds > 60
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High Replication Delay in Cassandra"
      description: "Cassandra data replication delay is exceeding 60 seconds."

Cassandra Grafana Dashboard

For Cassandra Grafana dashboard, you can find or create one that suits your needs. These dashboards typically include charts and visualizations for various Cassandra performance metrics, such as read/write operation latency, CPU usage, memory usage, disk I/O, network I/O, node status, and data replication delay. You can import an existing dashboard using its ID or download a JSON file from the Grafana dashboard repository, and then customize and configure it according to your requirements.

0 人点赞