【问题澄清】
- dm 出现报错,重启task后无法解决,删除 loader_checkpoint
与syncer_checkpoint 表,库、dump.data后 重新启动task,恢复成功 但syncer还是有问题,报错依旧。从dm服务器上使用同步指定账户登录源Mysql库没问题。权限与文档中一致为:
GRANT SELECT, RELOAD, REPLICATION SLAVE, REPLICATION CLIENT ON . TO ‘syncer’@’%';
- 并且 relay log 中的日志并不是最新的,为早些日志的日志。
query-status task_vccx
{
"result": true,
"msg": "",
"workers": [
{
"result": true,
"worker": "10.10.24.84:8264",
"msg": "",
"subTaskStatus": [
{
"name": "task_vccx",
"stage": "Running",
"unit": "Sync",
"result": {
"isCanceled": false,
"errors": [
{
"Type": "UnknownError",
"msg": "[code=40067:class=dm-worker:scope=internal:level=high] waiting for relay binlog pos to catch up with loader end binlog pos is timeout (exceeding 5m0s), loader end binlog pos: (mysql-bin.000020, 159702946), relay binlog pos: (mysql-bin.000002, 280333226)\
github.com/pingcap/dm/pkg/terror.(*Error).Generate\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:232\
github.com/pingcap/dm/dm/worker.(*SubTask).unitTransWaitCondition\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/subtask.go:684\
github.com/pingcap/dm/dm/worker.(*SubTask).run\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/subtask.go:193\
github.com/pingcap/dm/dm/worker.(*SubTask).fetchResult\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/subtask.go:280\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357",
"error": null
}
],
"detail": null
},
"unresolvedDDLLockID": "",
"sync": {
"totalEvents": "0",
"totalTps": "0",
"recentTps": "0",
"masterBinlog": "(mysql-bin.000020, 180373475)",
"masterBinlogGtid": "55b4bea9-902e-11ea-85ca-0242ac120003:1-53472472",
"syncerBinlog": "(, 4)",
"syncerBinlogGtid": "",
"blockingDDLs": [
],
"unresolvedGroups": [
],
"synced": false
}
}
],
"relayStatus": {
"masterBinlog": "(mysql-bin.000020, 180373475)",
"masterBinlogGtid": "55b4bea9-902e-11ea-85ca-0242ac120003:1-53472472",
"relaySubDir": "55b4bea9-902e-11ea-85ca-0242ac120003.000001",
"relayBinlog": "(mysql-bin.000002, 280333226)",
"relayBinlogGtid": "55b4bea9-902e-11ea-85ca-0242ac120003:1-3883867",
"relayCatchUpMaster": false,
"stage": "Paused",
"result": {
"isCanceled": false,
"errors": [
{
"Type": "UnknownError",
"msg": "[code=30014:class=relay-unit:scope=upstream:level=high] start reader for UUID 55b4bea9-902e-11ea-85ca-0242ac120003.000001: start sync from GTID set 55b4bea9-902e-11ea-85ca-0242ac120003:1-3883867: dial tcp 10.139.60.82:3306: connect: connection refused\
github.com/pingcap/dm/pkg/terror.(*Error).Delegate\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\
github.com/pingcap/dm/pkg/binlog/reader.(*TCPReader).StartSyncByGTID\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/binlog/reader/tcp.go:102\
github.com/pingcap/dm/relay/reader.(*reader).setUpReaderByGTID\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/reader/reader.go:160\
github.com/pingcap/dm/relay/reader.(*reader).Start\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/reader/reader.go:109\
github.com/pingcap/dm/relay.(*Relay).setUpReader\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:610\
github.com/pingcap/dm/relay.(*Relay).process\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:304\
github.com/pingcap/dm/relay.(*Relay).Process\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:191\
github.com/pingcap/dm/dm/worker.(*realRelayHolder).run\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:167\
github.com/pingcap/dm/dm/worker.(*realRelayHolder).Start.func1\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:143\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357",
"error": null
}
],
"detail": null
}
},
"sourceID": "mysql-replica-82-vccx_test"
}
]
}
【排查思路】
- 此报错在 msg 中一般会被定义为 internal,通过报错信息,可以判断时 relay log 这部分的问题
【解决方案】
- stop.yml -l dm-worker02 - 避免 relay log 被重写
- 删除 dm-worker02 对应的 relay log
- start.yml -l dm-worker02 来恢复
【经典案例】