作者:xuty。
爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。
本文约 1400 字,预计阅读需要 5 分钟。
1问题现象
生产环境中 MySQL 5.7.26 版本下,当主库短时间内连续遇到 2 次 Crash 的特殊场景时,会导致备库重新建立复制时会抛出错误 Slave has more GTIDs than the master has
,IO 线程复制报错。
2023-12-11T10:32:43.433707 08:00 1457551 [ERROR] Error reading packet from server for channel '': Slave has more GTIDs than the master has, using the master's SERVER_UUID. This may indicate that the end of the binary log was truncated or that the last binary log file was lost, e.g., after a power or disk failure when sync_binlog != 1. The master may or may not have rolled back transactions that were already replicated to the slave. Suggest to replicate any transactions that master has rolled back from slave to master, and/or commit empty transactions on master to account for transactions that have been (server_errno=1236)
如下图所示,备库比主库多了几十万个 GTID,且解析对应 Binlog 都是业务操作,更像是主库把这部分 GTID “丢掉了”。
- 主库:1-1080678246;
- 备库:1-1081067155;
2问题分析
先来复习下 MySQL 5.7 GTID 持久化原理:
- gtid_executed 变量:它是一个处于内存中的
GTID SET
,表示数据库中执行了那些 GTID,会实时更新,但是一旦重启就会丢失。show slave status
的Executed_Gtid_Set
和show master status
中的Executed_Gtid_Set
都来自于这个变量。 - mysql.gtid_executed 表:GTID 持久化的介质,只有在 binlog 切换时才会触发更新。将该 binlog 中的
GTID SET
记录到表中,所以该表中会记录所有历史 binlog 中的GTID SET
。