DM同步任务修改上游mysql地址不生效

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.0-rc
  • 【问题描述】:由于上游数据库ip地址发生改变,现需要修改DM相关配置以继续进行数据同步。我已经修改了DM master节点中的inventory.ini文件,并且执行了ansible-playbook /home/tidb/dm-ansible/rolling_update.yml。从DM worker日志中看,同步的地址并没有生效,执行的task仍然从已经失效的ip地址进行同步,并且提示连接超时。我想请问我该如何操作才能修改上游数据库连接并使之生效,谢谢。

补充一下:修改了DM master中的inventory.ini文件中dm_worker_servers字段mysql_host的值为新上游数据库地址

您好这里可以检查下两个地方

  1. 咱们做 切换 上游数据库 IP 时候是否有按照如下圈定步骤完整进行

    2.可以通过对应 task 的 dm-worker 的 配置文件确认,上游 IP 已正确更新到最新的.

您好,我在DM master节点上执行stop-task xxxx,提示这个任务不存在。尝试刷新任务列表,提示“msg": "[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = DeadlineExceeded desc = context deadline exceeded”,应该是运行超时了。我尝试过重启DM集群、重启服务器,但都不起作用。我在DM worker节点上看dm-worker.toml文件,里面地址已经是最新的上游数据库地址了,但是任务里面不生效。

您好 ,根据提示应是 dm-ctl 与 dm-master 失联.可以尝试下如下操作
1.关闭 dm 集群
2. 检查 各节点上服务是否顺利关闭,避免有之前残留的僵死服务.
3.重启dm 集群

如仍然不能解决问题,麻烦将出问题这段时间的 dm-master 和 dm-worker 的 log 日志打包上传.并说明出问题时间点 谢谢

重启集群我之前已经测试过了,现在无法控制task,从DM worker日志上看,DM集群一旦启动起来自动resume task,进入dmctl,执行stop task xxx提示task xxx has no workers or not exist, can try refresh-worker-tasks cmd first,执行refresh-worker-tasks也是一样的错误。执行query-error,发现我的任务都在这里面,但是都是paused状态。是否有手动方式能够停止同步任务呢,让他重新读取一下相关配置。

比较诡异的一点是,关闭集群的时候master能够控制worker结束dm-work进程,启动的时候也能控制,但是不能刷新任务列表以及启停task。

如方便 可以将出问题的 DM-worker log 和 dm-master log 上传我们分析下具体原因.

并也请标注下 DM 的具体版本

DM master日志:
[2020/09/04 05:46:42.524 +00:00] [ERROR] [server.go:1308] [“create FetchDDLInfo stream”] [worker=172.31.1.50:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 172.31.1.50:8262: connect: connection refused”"]
[2020/09/04 05:46:47.524 +00:00] [ERROR] [server.go:1308] [“create FetchDDLInfo stream”] [worker=172.31.1.50:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 172.31.1.50:8262: connect: connection refused”"]
[2020/09/04 05:46:52.525 +00:00] [ERROR] [server.go:1308] [“create FetchDDLInfo stream”] [worker=172.31.1.50:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 172.31.1.50:8262: connect: connection refused”"]

master和worker之间可以用tidb用户免密ssh通信,各个端口也都通,网络也通。

DM版本:1.0.5

DM work日志:
[2020/09/04 05:13:16.713 +08:00] [INFO] [syncer.go:893] [“flushed checkpoint”] [task=xxxx] [unit=“binlog replication”] [checkpoint="(mysql-bin-changelog|000001.053539, 133472905)(flushed (mysql-bin-changelog|000001.053539, 133472905))"]
[2020/09/04 05:13:16.713 +08:00] [INFO] [relay.go:113] [“current earliest active relay log”] [task=xxxx] [unit=“binlog replication”] [“active relay log”=23b5ce5f-4a43-3bbf-b221-c193041bbf77.000001/mysql-bin-changelog.053539]
[2020/09/04 05:13:17.626 +08:00] [INFO] [syncer.go:2076] [“binlog replication progress”] [task=xxxx] [unit=“binlog replication”] [“total binlog size”=126681855310] [“last binlog size”=126681461359] [“cost time”=30] [bytes/Second=13131] [“unsynced binlog size”=0] [“estimate time to catch up”=0]
[2020/09/04 05:13:17.627 +08:00] [INFO] [syncer.go:2101] [“binlog replication status”] [task=xxxx] [unit=“binlog replication”] [total_events=9333453] [total_tps=19] [tps=0] [master_position="(mysql-bin-changelog.053539, 133494492)"] [master_gtid=] [checkpoint="(mysql-bin-changelog|000001.053539, 133494492)(flushed (mysql-bin-changelog|000001.053539, 133108937))"]
[2020/09/04 05:13:18.832 +08:00] [INFO] [server.go:252] [request=QueryStatus] [payload=]
[2020/09/04 05:13:22.320 +08:00] [INFO] [syncer.go:2076] [“binlog replication progress”] [task=xxxx] [unit=“binlog replication”] [“total binlog size”=31582286919] [“last binlog size”=31581855493] [“cost time”=30] [bytes/Second=14380] [“unsynced binlog size”=0] [“estimate time to catch up”=0]
[2020/09/04 05:13:22.321 +08:00] [INFO] [syncer.go:2101] [“binlog replication status”] [task=xxxx] [unit=“binlog replication”] [total_events=238933] [total_tps=1] [tps=0] [master_position="(mysql-bin-changelog.053539, 133544577)"] [master_gtid=] [checkpoint="(mysql-bin-changelog|000001.053539, 133544577)(flushed (mysql-bin-changelog|000001.053539, 133108937))"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:39] [“fail to get master status”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\ github.com/pingcap/dm/pkg/terror.DBErrorAdapt\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\ github.com/pingcap/dm/pkg/utils.GetMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\ github.com/pingcap/dm/syncer.(*UpStreamConn).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker).StatusJSON\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker).Start\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server).Start.func1\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:44] [“fail to get flushed global point”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\ github.com/pingcap/dm/pkg/terror.DBErrorAdapt\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\ github.com/pingcap/dm/pkg/utils.GetMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\ github.com/pingcap/dm/syncer.(*UpStreamConn).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker).StatusJSON\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker).Start\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server).Start.func1\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:39] [“fail to get master status”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\ github.com/pingcap/dm/pkg/terror.DBErrorAdapt\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\ github.com/pingcap/dm/pkg/utils.GetMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\ github.com/pingcap/dm/syncer.(*UpStreamConn).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker).StatusJSON\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker).Start\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server).Start.func1\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:44] [“fail to get flushed global point”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error).Delegate\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\ github.com/pingcap/dm/pkg/terror.DBErrorAdapt\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\ github.com/pingcap/dm/pkg/utils.GetMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\ github.com/pingcap/dm/syncer.(*UpStreamConn).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer).getMasterStatus\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker).Status\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker).StatusJSON\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker).Start\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server).Start.func1\ \t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357"

也可以先尝试下 如下命令确认下 dm-worker 是否可以 正常返回
curl http://172.31.1.50:8262/status
curl http://172.31.1.50:8262/metrics

并请在 dm-ctl 中执行 query-status 并截图
查看需要停止的 task 是否存在.
如存在使用 query-status 查看 task 错误 并截图
谢谢

status:
image
metrics:
log.log (99.6 KB)
dmctl:

刷新任务列表:

您好,我先请教一下,你这边是原mysql只更换了ip地址,还是主从集群中,从主节点更换到了从节点?

您好,只是上游mysql数据库更换了ip地址,没有主从切换。

如果只是修改了 ip 地址只需要这样修改:

  1. 停 DM-worker
  2. 改配置里的 IP
  3. 重启 DM-worker

DM worker里面的信息已经从master同步过来了,我确认里面是新的上游mysql地址,但是在我kill掉dm-work之后,重新运行dm-work时系统提示

DM worker可以通过mysql客户端直接访问上游数据库,DM master和worker之间也可以免密ssh通信,端口telnet通。