TiDB DM2.0.1 停止同步

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】
v2.0.1

【问题描述】
1、master 状态如下

"result": true,
"msg": "",
"sources": [
    {
        "result": true,
        "msg": "",
        "sourceStatus": {
            "source": "mysql-dubheci",
            "worker": "worker6",
            "result": null,
            "relayStatus": null
        },
        "subTaskStatus": [
            {
                "name": "dubhe-ci-task",
                "stage": "Running",
                "unit": "Sync",
                "result": null,
                "unresolvedDDLLockID": "",
                "sync": {
                    "totalEvents": "0",
                    "totalTps": "0",
                    "recentTps": "0",
                    "masterBinlog": "(dubheci-mysql-bin.000025, 135010353)",
                    "masterBinlogGtid": "0e266468-560e-11e9-bdaa-024276ac8ec2:1,1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-**2526466**",
                    "syncerBinlog": "(dubheci-mysql-bin.000018, 179918754)",
                    "syncerBinlogGtid": "0e266468-560e-11e9-bdaa-024276ac8ec2:1,1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-**1774244**",
                    "blockingDDLs": [
                    ],
                    "unresolvedGroups": [
                    ],
                    "synced": false,
                    "binlogType": "remote"
                }
            }
        ]
    }
]

但是syncerBinlogGtid长时间没更新(syncerBinlogGtid是1774244,masterBinlogGtid是2526466)

2、worker没有WARN或者ERROR日志,但是重复打印以下日志,并且曾自行exit,重启后依旧打印通用日志
2021/05/10 15:45:42.639 +08:00] [DEBUG] [server.go:209] [“etcd member list doesn’t change”] [“client URLs”="[xxxx]"]
[2021/05/10 15:45:42.717 +08:00] [DEBUG] [checkpoint.go:610] [“try to rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_43] [checkpoint=“position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1(flushed position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1)”]
[2021/05/10 15:45:42.717 +08:00] [INFO] [checkpoint.go:613] [“rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_43] [from=“position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1”] [to=“position: (dubheci-mysql-bin.000017, 154376165), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1731694,0e266468-560e-11e9-bdaa-024276ac8ec2:1”]
[2021/05/10 15:45:44.738 +08:00] [DEBUG] [checkpoint.go:610] [“try to rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_30] [checkpoint=“position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913,0e266468-560e-11e9-bdaa-024276ac8ec2:1(flushed position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913,0e266468-560e-11e9-bdaa-024276ac8ec2:1)”]
[2021/05/10 15:45:44.739 +08:00] [INFO] [checkpoint.go:613] [“rollback checkpoint”] [task=dubhe-ci-task] [unit=“binlog replication”] [component=“remote checkpoint”] [schema=dubhepayapply] [table=t_pay_apply_30] [from=“position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913,0e266468-560e-11e9-bdaa-024276ac8ec2:1”] [to=“position: (dubheci-mysql-bin.000017, 159704453), gtid-set: 0e266468-560e-11e9-bdaa-024276ac8ec2:1,1ee46438-54d3-11eb-8b2e-da43e2f93ff5:1-1732913”]
[2021/05/10 15:45:45.640 +08:00] [DEBUG] [server.go:209] [“etcd member list doesn’t change”] [“client URLs”="[xxx]"]


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

请问是什么问题?

  1. 麻烦display 下DM的部署情况
  2. 遇到报错前,有过什么特殊操作吗?

请检查一下以下几点:

  1. dm-master 是否有报错日志。
  2. relay.meta 文件是否有定时更新。
  3. 麻烦上传 dm-master leader 的完整日志以及 这个问题 task 的 dm-worker 的完整日志。

部署:2VM 3-masters 6-workers
应该是之前有分表的ddl同步出错,自动重试回退吗?
目前解决了。

请问是自动恢复了嘛? sharding ddl 操作么 ?

有更详细的信息,比如对应 task 的 dm worker 运行日志吗

手工重启且重设了GTID起点解决。有sharding ddl

你好,请问下该问题经常出现吗?

如果下次出现,可以拿一下 dm-master leader 的完整日志以及 这个问题 task 的 dm-worker 的完整日志,和 task 配置,我们分析下,感谢。