TiDB DM2.0.1 停止同步

不懂TiDB的程序猿不是好的猪肉佬 · 2021 年5 月 10 日 07:51

为提高效率，请提供以下信息，问题描述清晰能够更快得到解决：

【TiDB 版本】
v2.0.1

【问题描述】
1、master 状态如下

"result": true,
"msg": "",
"sources": [
    {
        "result": true,
        "msg": "",
        "sourceStatus": {
            "source": "mysql-dubheci",
            "worker": "worker6",
            "result": null,
            "relayStatus": null
        },
        "subTaskStatus": [
            {
                "name": "dubhe-ci-task",
                "stage": "Running",
                "unit": "Sync",
                "result": null,
                "unresolvedDDLLockID": "",
                "sync": {
                    "totalEvents": "0",
                    "totalTps": "0",
                    "recentTps": "0",
                    "masterBinlog": "(dubheci-mysql-bin.000025, 135010353)",
                    "masterBinlogGtid": "0e266468-560e-11e9-bdaa-024276ac8ec2:1,1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-**2526466**",
                    "syncerBinlog": "(dubheci-mysql-bin.000018, 179918754)",
                    "syncerBinlogGtid": "0e266468-560e-11e9-bdaa-024276ac8ec2:1,1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-**1774244**",
                    "blockingDDLs": [
                    ],
                    "unresolvedGroups": [
                    ],
                    "synced": false,
                    "binlogType": "remote"
                }
            }
        ]
    }
]

但是syncerBinlogGtid长时间没更新（syncerBinlogGtid是1774244，masterBinlogGtid是2526466）

2、worker没有WARN或者ERROR日志，但是重复打印以下日志，并且曾自行exit，重启后依旧打印通用日志
2021/05/10 15:45:42.639 +08:00] [DEBUG] [server.go:209] [“etcd member list doesn’t change”] [“client URLs”="[xxxx]"]
[2021/05/10 15:45:42.717 +08:00] [DEBUG] [checkpoint.go:610] [“try to rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_43] [checkpoint=“position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1(flushed position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1)”]
[2021/05/10 15:45:42.717 +08:00] [INFO] [checkpoint.go:613] [“rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_43] [from=“position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1”] [to=“position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1”]
[2021/05/10 15:45:44.738 +08:00] [DEBUG] [checkpoint.go:610] [“try to rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_30] [checkpoint=“position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913,0e266468-560e-11e9-bdaa-024276ac8ec2:1(flushed position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913,0e266468-560e-11e9-bdaa-024276ac8ec2:1)”]
[2021/05/10 15:45:44.739 +08:00] [INFO] [checkpoint.go:613] [“rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_30] [from=“position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913,0e266468-560e-11e9-bdaa-024276ac8ec2:1”] [to=“position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 0e266468-560e-11e9-bdaa-024276ac8ec2:1,1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913”]
[2021/05/10 15:45:45.640 +08:00] [DEBUG] [server.go:209] [“etcd member list doesn’t change”] [“client URLs”="[xxx]"]

若提问为性能优化、故障排查类问题，请下载脚本运行。终端输出的打印结果，请务必全选并复制粘贴上传。

请问是什么问题？

yilong · 2021 年5 月 10 日 09:56

麻烦display 下DM的部署情况
遇到报错前，有过什么特殊操作吗？

QBin · 2021 年5 月 10 日 09:57

请检查一下以下几点：

dm-master 是否有报错日志。
relay.meta 文件是否有定时更新。
麻烦上传 dm-master leader 的完整日志以及这个问题 task 的 dm-worker 的完整日志。

不懂TiDB的程序猿不是好的猪肉佬 · 2021 年5 月 11 日 09:32

部署：2VM 3-masters 6-workers
应该是之前有分表的ddl同步出错，自动重试回退吗？
目前解决了。

Lucien · 2021 年5 月 12 日 04:15

请问是自动恢复了嘛？ sharding ddl 操作么？

IANTHEREAL · 2021 年5 月 14 日 06:47

有更详细的信息，比如对应 task 的 dm worker 运行日志吗

不懂TiDB的程序猿不是好的猪肉佬 · 2021 年5 月 25 日 02:53

手工重启且重设了GTID起点解决。有sharding ddl

小王同学 · 2021 年5 月 26 日 02:14

你好，请问下该问题经常出现吗？

如果下次出现，可以拿一下 dm-master leader 的完整日志以及这个问题 task 的 dm-worker 的完整日志，和 task 配置，我们分析下，感谢。