先说结论: 使用者的姿势问题导致,但是pt-archiver也缺少相关的校验机制,最终导致数据错误没有被及时发现。
初始化数据
代码语言:txt复制use test;
CREATE TABLE `t1` (
`id` int NOT NULL AUTO_INCREMENT,
`batch_id` int DEFAULT NULL,
`res_status` int NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
CREATE TABLE `t2` (
`id` int NOT NULL AUTO_INCREMENT,
`batch_id` int DEFAULT NULL,
`res_status` int NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
CREATE TABLE `t3` (
`id` int NOT NULL AUTO_INCREMENT,
`batch_id` int DEFAULT NULL,
`res_status` int NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
insert into t1(batch_id,res_status) select 112,1;
insert into t1(batch_id,res_status) select batch_id,res_status from t1;
insert into t1(batch_id,res_status) select batch_id,res_status from t1;
insert into t1(batch_id,res_status) select batch_id,res_status from t1;
[test]> select * from t1 order by id asc limit 5;
---- ---------- ------------
| id | batch_id | res_status |
---- ---------- ------------
| 1 | 112 | 1 |
| 2 | 112 | 1 |
| 3 | 112 | 1 |
| 4 | 112 | 1 |
| 5 | 112 | 1 |
---- ---------- ------------
5 rows in set (0.00 sec)
案例
1 有问题的写法
代码语言:txt复制# 注意下面的字段 res_status是大写字母
pt-archiver
--source h=192.168.31.181,u=dts,p='123456',P=3306,D=test,t=t1
--dest h=192.168.31.181,u=dts,p='123456',P=3306,D=test,t=t2
--progress 100 --where '1=1'
--no-delete --charset=UTF8 --limit=100
--no-check-columns --columns batch_id,RES_STATUS
--statistics --txn-size 100 --no-version-check
同步完成后,查询t2表,可以看到res_status字段的值都错了
[test]> select * from t2 limit 5;
---- ---------- ------------
| id | batch_id | res_status |
---- ---------- ------------
| 1 | 112 | 0 |
| 2 | 112 | 0 |
| 3 | 112 | 0 |
| 4 | 112 | 0 |
| 5 | 112 | 0 |
---- ---------- ------------
5 rows in set (0.00 sec)
general_log部分内容如下:
2024-08-23T12:24:05.327275 08:00 42 Query USE `test`
2024-08-23T12:24:05.327542 08:00 42 Query SHOW CREATE TABLE `test`.`t1`
2024-08-23T12:24:05.339690 08:00 43 Query USE `test`
2024-08-23T12:24:05.339920 08:00 43 Query SHOW CREATE TABLE `test`.`t2`
2024-08-23T12:24:05.348830 08:00 42 Query SELECT MAX(`id`) FROM `test`.`t1`
2024-08-23T12:24:05.349553 08:00 42 Query SELECT /*!40001 SQL_NO_CACHE */ `batch_id`,`RES_STATUS`,`id` FROM `test`.`t1` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`id` < '20') ORDER BY `id` LIMIT 100
2024-08-23T12:24:05.380780 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.381702 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.382146 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.382448 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.382752 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.383147 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.383461 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.383759 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.384169 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.384503 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.384831 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.385259 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.385578 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.385855 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.386262 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.386618 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.387013 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.387331 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.387627 08:00 43 Query INSERT INTO `test`.`t2`(`batch_id`) VALUES ('112')
2024-08-23T12:24:05.387991 08:00 42 Query SELECT /*!40001 SQL_NO_CACHE */ `batch_id`,`RES_STATUS`,`id` FROM `test`.`t1` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`id` < '20') AND ((`id` > '19')) ORDER BY `id` LIMIT 100
2 没问题的写法
代码语言:txt复制pt-archiver
--source h=192.168.31.181,u=dts,p='123456',P=3306,D=test,t=t1
--dest h=192.168.31.181,u=dts,p='123456',P=3306,D=test,t=t3
--progress 100 --where '1=1'
--no-delete --charset=UTF8 --limit=100
--no-check-columns --columns batch_id,res_status
--statistics --txn-size 100 --no-version-check
[test]> select * from t3 limit 5;
---- ---------- ------------
| id | batch_id | res_status |
---- ---------- ------------
| 1 | 112 | 1 |
| 2 | 112 | 1 |
| 3 | 112 | 1 |
| 4 | 112 | 1 |
| 5 | 112 | 1 |
---- ---------- ------------
5 rows in set (0.00 sec)
可以看到 t3 表 的 res_status的是没问题的。
general_log部分内容如下:
2024-08-23T12:19:10.000235 08:00 38 Query USE `test`
2024-08-23T12:19:10.000482 08:00 38 Query SHOW CREATE TABLE `test`.`t1`
2024-08-23T12:19:10.015465 08:00 39 Query USE `test`
2024-08-23T12:19:10.015677 08:00 39 Query SHOW CREATE TABLE `test`.`t3`
2024-08-23T12:19:10.025237 08:00 38 Query SELECT MAX(`id`) FROM `test`.`t1`
2024-08-23T12:19:10.025925 08:00 38 Query SELECT /*!40001 SQL_NO_CACHE */ `batch_id`,`res_status`,`id` FROM `test`.`t1` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`id` < '20') ORDER BY `id` LIMIT 100
2024-08-23T12:19:10.026402 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.027093 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.027416 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.027720 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.028154 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.028552 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.028938 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.029324 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.029695 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.030029 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.030351 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.030679 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.031042 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.031480 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.031830 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.032214 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.032545 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.032839 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.033294 08:00 39 Query INSERT INTO `test`.`t3`(`batch_id`,`res_status`) VALUES ('112','1')
2024-08-23T12:19:10.033736 08:00 38 Query SELECT /*!40001 SQL_NO_CACHE */ `batch_id`,`res_status`,`id` FROM `test`.`t1` FORCE INDEX(`PRIMARY`) WHERE (1=1) AND (`id` < '20') AND ((`id` > '19')) ORDER BY `id` LIMIT 100
3 也可以使用pt-archiver的debug功能
代码语言:txt复制此外,也可以启用pt-archiver的debug选项
export PTDEBUG=1
然后再执行pt-arhiver命令,即可在控制台输出明细日志
正常的
# pt_archiver:6721 47664 inst stmt: $VAR1 = {
# cols => [
# 'batch_id',
# 'res_status'
# ],
# slice => [
# 0,
# 1
# ]
# };
#
# pt_archiver:6721 47664 inst stmt: $VAR1 = {
# cols => [
# 'batch_id',
# 'res_status'
# ],
# slice => [
# 0,
# 1
# ]
# };
#
异常的
# pt_archiver:6721 38702 inst stmt: $VAR1 = {
# cols => [
# 'batch_id'
# ],
# slice => [
# 0
# ]
# };
#
# pt_archiver:6721 38702 inst stmt: $VAR1 = {
# cols => [
# 'batch_id'
# ],
# slice => [
# 0
# ]
# };
#