Task

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】: DM版本:1.0.2 TiDB版本:3.0.4
  • 【问题描述】: task通过query-status查看报错,通过query-error 查看任务正常,我尝试resume-task office_dip_test,无法恢复,这个如何恢复呢

image

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

“msg”: “[code=30012:class=relay-unit:scope=upstream:level=high] start reader for UUID 739342e7-4ec6-11e8-9f34-00505687548d.000001: start sync from position (mysql-bin.002392, 102514034): dial tcp 192.168.47.142:30112: connect: connection refused\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/binlog/reader.(*TCPReader).StartSyncByPos\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/binlog/reader/tcp.go:79\ngithub.com/pingcap/dm/relay/reader.(*reader).setUpReaderByPos\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/reader/reader.go:166\ngithub.com/pingcap/dm/relay/reader.(*reader).Start\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/reader/reader.go:111\ngithub.com/pingcap/dm/relay.(*Relay).setUpReader\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:606\ngithub.com/pingcap/dm/relay.(*Relay).process\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:304\ngithub.com/pingcap/dm/relay.(*Relay).Process\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/relay/relay.go:191\ngithub.com/pingcap/dm/dm/worker.(*realRelayHolder).run\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:164\ngithub.com/pingcap/dm/dm/worker.(*realRelayHolder).Start.func1\n\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/relay.go:140\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337”

我在woker的节点上访问上游的数据库都是正常的

192.168.47.142:30112这是上游 MySQL IP 地址以及端口吗

是的1

执行 resume 操作,dm-worker 节点日志是怎么样的

我通过重启worker节点恢复正常,可能是woker节点和上游mysql连接断开没有重连了,我重启又重新连上了,像这种情况,task的状态是runing,但是 已经不同步了,我监控那个指标能发现这个问题呢 ./scripts/stop_dm-worker.sh ./scripts/start_dm-worker.sh

这边报错是在 Relay 模块,可以看下这些监控: