dm-master和dm-worker 同时宕机, 同步的子任务 同步报错

【 TiDB 使用环境】生产\测试环境\ POC
【 TiDB 版本】dm 2.0.7 版本
【遇到的问题】dm-master和dm-worker 同时宕机,机器启动后 同步的10个子任务 有2个子任务同步报错。请教子任务如何修复
【复现路径】做过哪些操作出现的问题
【问题现象及影响】
{
“result”: true,
“msg”: “”,
“sourceStatus”: {
“source”: “source_tally1”,
“worker”: “dm-127.0.0.1-8263”,
“result”: null,
“relayStatus”: null
},
“subTaskStatus”: [
{
“name”: “task_new”,
“stage”: “Paused”,
“unit”: “Sync”,
“result”: {
“isCanceled”: false,
“errors”: [
{
“ErrCode”: 10006,
“ErrClass”: “database”,
“ErrScope”: “not-set”,
“ErrLevel”: “high”,
“Message”: “startLocation: [position: (, 0), gtid-set: ], endLocation: [position: (binlog.xxxx, xxxx), gtid-set: ]: execute statement failed: commit”,
“RawCause”: “Error 1062: Duplicate entry ‘3459239597’ for key ‘uniq_guid’”,
“Workaround”: “”
}
],
“detail”: null
},
“unresolvedDDLLockID”: “”,
“sync”: {
“totalEvents”: “20674”,
“totalTps”: “344”,
“recentTps”: “0”,
“masterBinlog”: “(binlog.xxxx, xxxx)”,
“masterBinlogGtid”: “”,
“syncerBinlog”: “(binlog.xxxx, xxxx)”,
“syncerBinlogGtid”: “”,
“blockingDDLs”: [
],
“unresolvedGroups”: [
],
“synced”: false,
“binlogType”: “remote”,
“secondsBehindMaster”: “0”
}
}
]
},

	{
        "result": true,
        "msg": "",
        "sourceStatus": {
            "source": "source_7",
            "worker": "dm-127.0.0.1-8266",
            "result": {
                "isCanceled": false,
                "errors": [
                    {
                        "ErrCode": 40071,
                        "ErrClass": "dm-worker",
                        "ErrScope": "internal",
                        "ErrLevel": "high",
                        "Message": "mysql source worker dm-127.0.0.1-8266 has already started with source source_abc, but get a request with source source_7",
                        "RawCause": "",
                        "Workaround": "Please try restart this DM-worker"
                    }
                ],
                "detail": null
            },
            "relayStatus": null
        },
        "subTaskStatus": [
            {
                "name": "task_new",
                "stage": "InvalidStage",
                "unit": "InvalidUnit",
                "result": null,
                "unresolvedDDLLockID": "",
                "msg": "no sub task with name task_new has started"
            }
        ]
    },

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

能说一下做过什么操作吗?比如部署方式,启动方式。

Error 1062: Duplicate entry ‘3459239597’ for key ‘uniq_guid

这里看出是 duplicate key,所以 syncer 的状态是 Paused。您描述的宕机请问是在任务 Paused 之后自动发生的吗?

目前修复好了, 是dm-master和dm-worker同时宕机导致的,感谢。

1 个赞

DM 集群可以部署冗余 master、worker 节点从而做到高可用,感谢您的反馈!