DM数据同步,与主库连接断开后,能否自动重试

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:4.0
    DM版本:1.0.6
  • 【问题描述】:

dm worker数据同步时,出现如下错误
“relayStatus”: {
“masterBinlog”: “(mysql-bin.000005, 644419821)”,
“masterBinlogGtid”: “19e56fb6-ab06-11ea-8f8f-fa16098b7798:1-26305”,
“relaySubDir”: “19e56fb6-ab06-11ea-8f8f-fa16098b7798.000001”,
“relayBinlog”: “(mysql-bin.000005, 644391646)”,
“relayBinlogGtid”: “”,
“relayCatchUpMaster”: false,
“stage”: “Paused”,
“result”: {
“isCanceled”: false,
“errors”: [
{
“Type”: “UnknownError”,
“msg”: “”,
“error”: {
“ErrCode”: 30012,
“ErrClass”: 8,
“ErrScope”: 1,
“ErrLevel”: 3,
“Message”: “start reader for UUID 19e56fb6-ab06-11ea-8f8f-fa16098b7798.000001: start sync from position (mysql-bin.000005, 644391646): dial tcp x.x.x.x:33066: connect: no route to host”,
“RawCause”: “dial tcp x.x.x.x:33066: connect: no route to host”
}
}
],
“detail”: null
}
}
重启worker后同步恢复。
dm能否自动重试

您好,DM 针对于上游的重试分好多种的,如果上游 MySQL 重启,这种 DM 是不会重试的。如果是 salve_net_timeout 这种。是会重试的。上游是出现什么问题了么? 或者反馈下部分 dm-worker 的日志,我们看下是否符合预期。

[2020/07/16 20:28:18.005 +08:00] [WARN] [relay.go:299] [“receive retryable error for binlog reader”]
[component=“relay log”] [error="[code=30015:class=relay-unit:scope=upstream:level=high] TCPReader get relay event with error: io.ReadFull(header) failed. err read tcp x.x.x.x:60894->x.x.x.x:33316: i/o timeout: connection was bad"]
[2020/07/16 20:28:20.673 +08:00] [WARN] [relay.go:299] [“receive retryable error for binlog reader”] [component=“relay log”] [error="[code=30015:class=relay-unit:scope=upstream:level=high] TCPReader get relay event with error: io.ReadFull(header) failed. err read tcp x.x.x.x:51452->x.x.x.x:33066: i/o timeout: connection was bad"]
[2020/07/16 20:28:38.221 +08:00] [ERROR] [relay.go:302] [“fail to close binlog event reader”] [component=“relay log”] [error="[code=10001:class=database:scope=upstream:level=high] kill connection 76986 for master x.x.x.x:33316: database driver error: dial tcp x.x.x.x:33316: connect: no route to hostconnect: no route to host"] [errorVerbose="[code=10001:class=database:scope=upstream:level=high] kill connection 76986 for master x.x.x.x:33316: database driver error: dial tcp x.x.x.x:33316: connect: no route to host\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:271\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:39\ github.com/pingcap/dm/pkg/terror.DBErrorAdapt\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\ github.com/pingcap/dm/pkg/utils.KillConn\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:356\ github.com/pingcap/dm/pkg/binlog/reader.(*TCPReader).Close\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/binlog/reader/tcp.go:131\ngithub.com/pingcap/dm/relay/reader.(*reader).Close\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/reader/reader.go:126\ngithub.com/pingcap/dm/relay.(*Relay).process\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:300\ngithub.com/pingcap/dm/relay.(*Relay).Process\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:191\ngithub.com/pingcap/dm/dm/worker.(*realRelayHolder).run\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:167\ngithub.com/pingcap/dm/dm/worker.(*realRelayHolder).Start.func1\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:143\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357"]

看起来是读取上游 binlog 超时,有可能是网络抖动,另外可以检查 binlog 文件的状态 mysql-bin.000005, 644391646