原文链接 https://blogs.oracle.com/mysql/post/mysql-8-and-replication-observability
我们中的许多人,老 MySQL DBA 使用SHOW REPLICA STATUS中的Seconds_Behind_Source来找出(异步)复制的状态和正确执行。
请注意新术语。我确信我们都使用过旧术语。
然而,MySQL 复制已经发生了很大的变化,并且复制团队一直致力于包含有关 MySQL 可用的所有复制风格的大量有用信息。
例如,我们添加了并行复制、组复制……所有这些信息都从旧的SHOW REPLICA STATUS结果中丢失了。
使用Performance_Schema有更好的方法来监视和观察复制过程。
目前在Performance_Schema中,有 15 个与复制检测相关的表:
代码语言:javascript复制 ------------------------------------------------------
| Tables_in_performance_schema (replication%) |
------------------------------------------------------
| replication_applier_configuration |
| replication_applier_filters |
| replication_applier_global_filters |
| replication_applier_status |
| replication_applier_status_by_coordinator |
| replication_applier_status_by_worker |
| replication_asynchronous_connection_failover |
| replication_asynchronous_connection_failover_managed |
| replication_connection_configuration |
| replication_connection_status |
| replication_group_communication_information |
| replication_group_configuration_version |
| replication_group_member_actions |
| replication_group_member_stats |
| replication_group_members |
------------------------------------------------------
15 rows in set (0.0038 sec)
但确实,理解这些指标的含义以及在哪里寻找对我们 MySQL DBA 重要的有用信息并不总是那么容易:我的复制是否落后于源?
我创建了一些可以安装在sys中的视图,以使用大多数这些指标来获取与我们 DBA 相关的内容:mysql_8_replication_observability.sql
地址:https://gist.github.com/lefred/1bad64403923664a14e0f20f572d7526
注意:这个命令查询只支持启用GTID的复制,非GTID环境下会报错提示: ERROR 1772 (HY000): Malformed GTID set specification 'ANONYMOUS'.
让我们仔细看看这些观点。
复制滞后
代码语言:javascript复制select * from sys.replication_lag;
--------------------------- ----------------------- ------------------------
| channel_name | max_lag_from_original | max_lag_from_immediate |
--------------------------- ----------------------- ------------------------
| clusterset_replication | 00:00:04.963223 | 00:00:04.940782 |
| group_replication_applier | 0 | 0 |
--------------------------- ----------------------- ------------------------
从上面的输出中,我们可以看到MySQL实例是一个异步副本,但它也是组复制集群的一部分。
事实上,这是InnoDB ClusterSet中 DR 集群的主要成员。
我们还可以看到这个副本几乎迟到了 5 秒(滞后)。
然后,我们获得复制通道的名称以及原始提交者和直接源(在级联复制的情况下)的最大延迟/滞后(因为在并行复制的情况下可能有多个工作线程)。
在组复制集群( InnoDB Cluster )的辅助成员上,我们可以看到以下输出:
代码语言:javascript复制select * from sys.replication_lag;
---------------------------- ----------------------- ------------------------
| channel_name | max_lag_from_original | max_lag_from_immediate |
---------------------------- ----------------------- ------------------------
| group_replication_recovery | null | null |
| group_replication_applier | 00:00:02.733008 | 00:00:02.733008 |
---------------------------- ----------------------- ------------------------
我们可以看到用于恢复的通道(当节点加入组时读取丢失的二进制日志事件、事务)没有被使用,并且组复制的应用程序有点落后。
复制状态
这个视图更完整,每个工人都有一条线。
让我们以InnoDB ClusterSet的 DR 站点的主要成员为例:
代码语言:javascript复制select * from replication_status;
------------------------------- ---------- ---------- --------- ------------------- --------------------
| channel | io_state | co_state | w_state | lag_from_original | lag_from_immediate |
------------------------------- ---------- ---------- --------- ------------------- --------------------
| group_replication_applier (1) | ON | ON | ON | none | none |
| group_replication_applier (2) | ON | ON | ON | none | none |
| group_replication_applier (3) | ON | ON | ON | none | none |
| group_replication_applier (4) | ON | ON | ON | none | none |
| clusterset_replication (1) | ON | ON | ON | 00:00:15.395870 | 00:00:15.380884 |
| clusterset_replication (2) | ON | ON | ON | 00:00:15.395686 | 00:00:15.380874 |
| clusterset_replication (3) | ON | ON | ON | 00:00:15.411204 | 00:00:15.388451 |
| clusterset_replication (4) | ON | ON | ON | 00:00:15.406154 | 00:00:15.388434 |
------------------------------- ---------- ---------- --------- ------------------- --------------------
We can see that parallel (asynchronous) replication from the Primary cluster uses 4 parallel workers.
We can also see that they are falling behind…
You may have noticed that there are 3 states (all at
ON). Using
SHOW REPLICA STATUS
we can only see:
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
通过并行复制,我们在应用 binlog 事件期间有另一个线程参与复制:协调器线程。
复制状态已满
当然,我们还可以了解更多关于复制的细节。
让我们看一个示例结果:
代码语言:javascript复制select * from sys.replication_status_fullG
*************************** 1. row ***************************
channel: group_replication_applier (1)
host: <NULL>
port: 0
user:
source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
last_heartbeat_timestamp: 0000-00-00 00:00:00.000000
heartbeat_interval: 30
io_state: ON
io_thread_state: NULL
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Replica has read all relay log; waiting for more updates
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: Waiting for an event from Coordinator
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 03:36:40.474223
applier_busy_state: IDLE
lag_from_original: none
lag_from_immediate: none
transport_time: 1.80 us
time_to_relay_log: 12.00 us
apply_time: 784.00 us
last_applied_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3
last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3
queued_gtid_set_to_apply:
*************************** 2. row ***************************
channel: group_replication_applier (2)
host: <NULL>
port: 0
user:
source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
last_heartbeat_timestamp: 0000-00-00 00:00:00.000000
heartbeat_interval: 30
io_state: ON
io_thread_state: NULL
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Replica has read all relay log; waiting for more updates
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: Waiting for an event from Coordinator
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 03:36:40.474223
applier_busy_state: IDLE
lag_from_original: none
lag_from_immediate: none
transport_time: 1.80 us
time_to_relay_log: 12.00 us
apply_time: 0 ps
last_applied_transaction:
last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3
queued_gtid_set_to_apply:
*************************** 3. row ***************************
channel: group_replication_applier (3)
host: <NULL>
port: 0
user:
source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
last_heartbeat_timestamp: 0000-00-00 00:00:00.000000
heartbeat_interval: 30
io_state: ON
io_thread_state: NULL
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Replica has read all relay log; waiting for more updates
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: Waiting for an event from Coordinator
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 03:36:40.474223
applier_busy_state: IDLE
lag_from_original: none
lag_from_immediate: none
transport_time: 1.80 us
time_to_relay_log: 12.00 us
apply_time: 0 ps
last_applied_transaction:
last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3
queued_gtid_set_to_apply:
*************************** 4. row ***************************
channel: group_replication_applier (4)
host: <NULL>
port: 0
user:
source_uuid: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
group_name: 7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e
last_heartbeat_timestamp: 0000-00-00 00:00:00.000000
heartbeat_interval: 30
io_state: ON
io_thread_state: NULL
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Replica has read all relay log; waiting for more updates
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: Waiting for an event from Coordinator
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 03:36:40.474223
applier_busy_state: IDLE
lag_from_original: none
lag_from_immediate: none
transport_time: 1.80 us
time_to_relay_log: 12.00 us
apply_time: 0 ps
last_applied_transaction:
last_queued_transaction: 7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:3
queued_gtid_set_to_apply:
*************************** 5. row ***************************
channel: clusterset_replication (1)
host: 127.0.0.1
port: 3310
user: mysql_innodb_cs_b0adbc6c
source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e
group_name:
last_heartbeat_timestamp: 2023-08-22 18:48:41.037817
heartbeat_interval: 30
io_state: ON
io_thread_state: Waiting for source to send event
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Waiting for replica workers to process their queues
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: waiting for handler commit
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 00:00:00.001134
applier_busy_state: APPLYING
lag_from_original: 00:00:01.799071
lag_from_immediate: 00:00:01.783404
transport_time: 2.26 ms
time_to_relay_log: 19.00 us
apply_time: 14.63 ms
last_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105180
last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547
queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547
*************************** 6. row ***************************
channel: clusterset_replication (2)
host: 127.0.0.1
port: 3310
user: mysql_innodb_cs_b0adbc6c
source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e
group_name:
last_heartbeat_timestamp: 2023-08-22 18:48:41.037817
heartbeat_interval: 30
io_state: ON
io_thread_state: Waiting for source to send event
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Waiting for replica workers to process their queues
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: waiting for handler commit
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 00:00:00.001134
applier_busy_state: APPLYING
lag_from_original: 00:00:01.797743
lag_from_immediate: 00:00:01.783390
transport_time: 2.26 ms
time_to_relay_log: 19.00 us
apply_time: 21.47 ms
last_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105181
last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547
queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547
*************************** 7. row ***************************
channel: clusterset_replication (3)
host: 127.0.0.1
port: 3310
user: mysql_innodb_cs_b0adbc6c
source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e
group_name:
last_heartbeat_timestamp: 2023-08-22 18:48:41.037817
heartbeat_interval: 30
io_state: ON
io_thread_state: Waiting for source to send event
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Waiting for replica workers to process their queues
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: waiting for handler commit
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 00:00:00.001134
applier_busy_state: APPLYING
lag_from_original: 00:00:01.786087
lag_from_immediate: 00:00:01.767563
transport_time: 2.26 ms
time_to_relay_log: 19.00 us
apply_time: 21.58 ms
last_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105182
last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547
queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547
*************************** 8. row ***************************
channel: clusterset_replication (4)
host: 127.0.0.1
port: 3310
user: mysql_innodb_cs_b0adbc6c
source_uuid: 2cb77a02-40eb-11ee-83f4-c8cb9e32df8e
group_name:
last_heartbeat_timestamp: 2023-08-22 18:48:41.037817
heartbeat_interval: 30
io_state: ON
io_thread_state: Waiting for source to send event
io_errno: 0
io_errmsg:
io_errtime: 0000-00-00 00:00:00.000000
co_state: ON
co_thread_state: Waiting for replica workers to process their queues
co_errno: 0
co_errmsg:
co_errtime: 0000-00-00 00:00:00.000000
w_state: ON
w_thread_state: waiting for handler commit
w_errno: 0
w_errmsg:
w_errtime: 0000-00-00 00:00:00.000000
time_since_last_message: 00:00:00.001134
applier_busy_state: APPLYING
lag_from_original: 00:00:01.785881
lag_from_immediate: 00:00:01.767550
transport_time: 2.26 ms
time_to_relay_log: 19.00 us
apply_time: 29.59 ms
last_applied_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105183
last_queued_transaction: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105547
queued_gtid_set_to_apply: 54d83026-40eb-11ee-a5d3-c8cb9e32df8e:105184-105547
通过这个视图,我们可以获得更多详细信息,例如复制心跳。我们还对 GTID(排队的、应用的……)进行了概述。
我们还可以看到有关传输时间(网络)、写入中继日志的时间以及最终应用时间的信息。
MySQL InnoDB 集群、ClusterSet、只读副本
如果您使用由Admin API和MySQL Shell管理的优秀集成解决方案,则可以使用status()方法获取所有这些信息。
status ( )方法可以扩展为3个值:
- 1 包括有关元数据版本、组协议版本、组名称、集群成员 UUID、集群成员角色和组复制报告的状态的信息以及受保护的系统变量列表;
- 2 包括连接和应用程序处理的交易信息
- 3 包括有关每个 集群成员的复制机制的更详细统计信息;
让我们看一下带有扩展选项3 的ClusterSet示例:
代码语言:javascript复制JS> cs.status({extended:3})
{
"clusters": {
"cluster2": {
"clusterRole": "REPLICA",
"clusterSetReplication": {
"applierQueuedTransactionSet": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e:137385-138500",
"applierQueuedTransactionSetSize": 1116,
"applierState": "ON",
"applierStatus": "APPLYING",
"applierThreadState": "waiting for handler commit",
"applierWorkerThreads": 4,
"coordinatorState": "ON",
"coordinatorThreadState": "Waiting for replica workers to process their queues",
"options": {
"connectRetry": 3,
"delay": 0,
"heartbeatPeriod": 30,
"retryCount": 10
},
"receiver": "127.0.0.1:4420",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for source to send event",
"receiverTimeSinceLastMessage": "00:00:00.002737",
"replicationSsl": null,
"source": "127.0.0.1:3310"
},
"clusterSetReplicationStatus": "OK",
"communicationStack": "MYSQL",
"globalStatus": "OK",
"groupName": "7b6bf13d-40ed-11ee-bfdd-c8cb9e32df8e",
"groupViewChangeUuid": "7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e",
"paxosSingleLeader": "OFF",
"receivedTransactionSet": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e:129-138500",
"ssl": "REQUIRED",
"status": "OK_NO_TOLERANCE",
"statusText": "Cluster is NOT tolerant to any failures.",
"topology": {
"127.0.0.1:4420": {
"address": "127.0.0.1:4420",
"applierWorkerThreads": 4,
"fenceSysVars": [
"read_only",
"super_read_only"
],
"memberId": "c3d726ac-40ec-11ee-ab38-c8cb9e32df8e",
"memberRole": "PRIMARY",
"memberState": "ONLINE",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "00:00:05.420247",
"replicationLagFromOriginalSource": "00:00:05.433548",
"role": "HA",
"status": "ONLINE",
"version": "8.1.0"
},
"127.0.0.1:4430": {
"address": "127.0.0.1:4430",
"applierWorkerThreads": 4,
"fenceSysVars": [
"read_only",
"super_read_only"
],
"memberId": "709b15ea-40ed-11ee-a9b3-c8cb9e32df8e",
"memberRole": "SECONDARY",
"memberState": "ONLINE",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "00:00:00.038075",
"replicationLagFromOriginalSource": "00:00:05.432536",
"role": "HA",
"status": "ONLINE",
"version": "8.1.0"
}
},
"transactionSet": "2cb77a02-40eb-11ee-83f4-c8cb9e32df8e:1-4,54d83026-40eb-11ee-a5d3-c8cb9e32df8e:1-137384,54d8329c-40eb-11ee-a5d3-c8cb9e32df8e:1-5,7b6bf4f0-40ed-11ee-bfdd-c8cb9e32df8e:1-3",
"transactionSetConsistencyStatus": "OK",
"transactionSetErrantGtidSet": "",
"transactionSetMissingGtidSet": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e:137385-138552"
},
"myCluster": {
"clusterRole": "PRIMARY",
"communicationStack": "MYSQL",
"globalStatus": "OK",
"groupName": "54d83026-40eb-11ee-a5d3-c8cb9e32df8e",
"groupViewChangeUuid": "54d8329c-40eb-11ee-a5d3-c8cb9e32df8e",
"paxosSingleLeader": "OFF",
"primary": "127.0.0.1:3310",
"ssl": "REQUIRED",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"127.0.0.1:3310": {
"address": "127.0.0.1:3310",
"applierWorkerThreads": 4,
"fenceSysVars": [],
"memberId": "2cb77a02-40eb-11ee-83f4-c8cb9e32df8e",
"memberRole": "PRIMARY",
"memberState": "ONLINE",
"mode": "R/W",
"readReplicas": {
"127.0.0.1:4410": {
"address": "127.0.0.1:4410",
"applierStatus": "APPLYING",
"applierThreadState": "waiting for handler commit",
"applierWorkerThreads": 4,
"receiverStatus": "ON",
"receiverThreadState": "Waiting for source to send event",
"replicationSources": [
"PRIMARY"
],
"replicationSsl": "TLS_AES_256_GCM_SHA384 TLSv1.3",
"role": "READ_REPLICA",
"status": "ONLINE",
"version": "8.1.0"
}
},
"role": "HA",
"status": "ONLINE",
"version": "8.1.0"
},
"127.0.0.1:3320": {
"address": "127.0.0.1:3320",
"applierWorkerThreads": 4,
"fenceSysVars": [
"read_only",
"super_read_only"
],
"memberId": "327cb102-40eb-11ee-9904-c8cb9e32df8e",
"memberRole": "SECONDARY",
"memberState": "ONLINE",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "00:00:04.536190",
"replicationLagFromOriginalSource": "00:00:04.536190",
"role": "HA",
"status": "ONLINE",
"version": "8.1.0"
},
"127.0.0.1:3330": {
"address": "127.0.0.1:3330",
"applierWorkerThreads": 4,
"fenceSysVars": [
"read_only",
"super_read_only"
],
"memberId": "3d141d7e-40eb-11ee-933b-c8cb9e32df8e",
"memberRole": "SECONDARY",
"memberState": "ONLINE",
"mode": "R/O",
"readReplicas": {},
"replicationLagFromImmediateSource": "00:00:04.652745",
"replicationLagFromOriginalSource": "00:00:04.652745",
"role": "HA",
"status": "ONLINE",
"version": "8.1.0"
}
},
"transactionSet": "2cb77a02-40eb-11ee-83f4-c8cb9e32df8e:1-4,54d83026-40eb-11ee-a5d3-c8cb9e32df8e:1-138552,54d8329c-40eb-11ee-a5d3-c8cb9e32df8e:1-5"
}
},
"domainName": "myClusterSet",
"globalPrimaryInstance": "127.0.0.1:3310",
"metadataServer": "127.0.0.1:3310",
"primaryCluster": "myCluster",
"status": "HEALTHY",
"statusText": "All Clusters available."
}
MySQL 热波
如果您在 OCI 上的 MySQL HeatWave 中使用任何类型的复制,则可以使用相同的视图,但您需要在不同的数据库上创建它们,因为sys是写保护的。
因此,如果您使用 HA、只读副本或手动复制通道,您还可以使用相同的视图来获得复制的准确概述。
结论
复制可观察性非常详细,并通过 MySQL 8 提供了大量信息。也许现在是改变查看或监视复制方式的好时机。