老司机1024
(.1024老司机)
2020 年9 月 4 日 02:58
1
为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
【TiDB 版本】:v4.0.0-rc
【问题描述】:由于上游数据库ip地址发生改变,现需要修改DM相关配置以继续进行数据同步。我已经修改了DM master节点中的inventory.ini文件,并且执行了ansible-playbook /home/tidb/dm-ansible/rolling_update.yml。从DM worker日志中看,同步的地址并没有生效,执行的task仍然从已经失效的ip地址进行同步,并且提示连接超时。我想请问我该如何操作才能修改上游数据库连接并使之生效,谢谢。
老司机1024
(.1024老司机)
2020 年9 月 4 日 03:06
2
补充一下:修改了DM master中的inventory.ini文件中dm_worker_servers字段mysql_host的值为新上游数据库地址
老司机1024
(.1024老司机)
2020 年9 月 4 日 05:37
4
您好,我在DM master节点上执行stop-task xxxx,提示这个任务不存在。尝试刷新任务列表,提示“msg": "[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = DeadlineExceeded desc = context deadline exceeded”,应该是运行超时了。我尝试过重启DM集群、重启服务器,但都不起作用。我在DM worker节点上看dm-worker.toml文件,里面地址已经是最新的上游数据库地址了,但是任务里面不生效。
sultan8252
(Sultan.Su@PingCAP)
2020 年9 月 4 日 07:01
5
您好 ,根据提示应是 dm-ctl 与 dm-master 失联.可以尝试下如下操作
1.关闭 dm 集群
2. 检查 各节点上服务是否顺利关闭,避免有之前残留的僵死服务.
3.重启dm 集群
如仍然不能解决问题,麻烦将出问题这段时间的 dm-master 和 dm-worker 的 log 日志打包上传.并说明出问题时间点 谢谢
老司机1024
(.1024老司机)
2020 年9 月 4 日 08:30
6
重启集群我之前已经测试过了,现在无法控制task,从DM worker日志上看,DM集群一旦启动起来自动resume task,进入dmctl,执行stop task xxx提示task xxx has no workers or not exist, can try refresh-worker-tasks
cmd first,执行refresh-worker-tasks也是一样的错误。执行query-error,发现我的任务都在这里面,但是都是paused状态。是否有手动方式能够停止同步任务呢,让他重新读取一下相关配置。
老司机1024
(.1024老司机)
2020 年9 月 4 日 08:35
7
比较诡异的一点是,关闭集群的时候master能够控制worker结束dm-work进程,启动的时候也能控制,但是不能刷新任务列表以及启停task。
sultan8252
(Sultan.Su@PingCAP)
2020 年9 月 4 日 08:36
8
如方便 可以将出问题的 DM-worker log 和 dm-master log 上传我们分析下具体原因.
并也请标注下 DM 的具体版本
老司机1024
(.1024老司机)
2020 年9 月 4 日 08:46
10
DM master日志:
[2020/09/04 05:46:42.524 +00:00] [ERROR] [server.go:1308] [“create FetchDDLInfo stream”] [worker=172.31.1.50:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 172.31.1.50:8262: connect: connection refused”"]
[2020/09/04 05:46:47.524 +00:00] [ERROR] [server.go:1308] [“create FetchDDLInfo stream”] [worker=172.31.1.50:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 172.31.1.50:8262: connect: connection refused”"]
[2020/09/04 05:46:52.525 +00:00] [ERROR] [server.go:1308] [“create FetchDDLInfo stream”] [worker=172.31.1.50:8262] [error="[code=38008:class=dm-master:scope=internal:level=high] grpc request error: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = “transport: Error while dialing dial tcp 172.31.1.50:8262: connect: connection refused”"]
老司机1024
(.1024老司机)
2020 年9 月 4 日 08:48
11
master和worker之间可以用tidb用户免密ssh通信,各个端口也都通,网络也通。
老司机1024
(.1024老司机)
2020 年9 月 4 日 08:59
13
DM work日志:
[2020/09/04 05:13:16.713 +08:00] [INFO] [syncer.go:893] [“flushed checkpoint”] [task=xxxx] [unit=“binlog replication”] [checkpoint="(mysql-bin-changelog|000001.053539, 133472905)(flushed (mysql-bin-changelog|000001.053539, 133472905))"]
[2020/09/04 05:13:16.713 +08:00] [INFO] [relay.go:113] [“current earliest active relay log”] [task=xxxx] [unit=“binlog replication”] [“active relay log”=23b5ce5f-4a43-3bbf-b221-c193041bbf77.000001/mysql-bin-changelog.053539]
[2020/09/04 05:13:17.626 +08:00] [INFO] [syncer.go:2076] [“binlog replication progress”] [task=xxxx] [unit=“binlog replication”] [“total binlog size”=126681855310] [“last binlog size”=126681461359] [“cost time”=30] [bytes/Second=13131] [“unsynced binlog size”=0] [“estimate time to catch up”=0]
[2020/09/04 05:13:17.627 +08:00] [INFO] [syncer.go:2101] [“binlog replication status”] [task=xxxx] [unit=“binlog replication”] [total_events=9333453] [total_tps=19] [tps=0] [master_position="(mysql-bin-changelog.053539, 133494492)"] [master_gtid=] [checkpoint="(mysql-bin-changelog|000001.053539, 133494492)(flushed (mysql-bin-changelog|000001.053539, 133108937))"]
[2020/09/04 05:13:18.832 +08:00] [INFO] [server.go:252] [request=QueryStatus] [payload=]
[2020/09/04 05:13:22.320 +08:00] [INFO] [syncer.go:2076] [“binlog replication progress”] [task=xxxx] [unit=“binlog replication”] [“total binlog size”=31582286919] [“last binlog size”=31581855493] [“cost time”=30] [bytes/Second=14380] [“unsynced binlog size”=0] [“estimate time to catch up”=0]
[2020/09/04 05:13:22.321 +08:00] [INFO] [syncer.go:2101] [“binlog replication status”] [task=xxxx] [unit=“binlog replication”] [total_events=238933] [total_tps=1] [tps=0] [master_position="(mysql-bin-changelog.053539, 133544577)"] [master_gtid=] [checkpoint="(mysql-bin-changelog|000001.053539, 133544577)(flushed (mysql-bin-changelog|000001.053539, 133108937))"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:39] [“fail to get master status”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error ).Delegate\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\
github.com/pingcap/dm/pkg/terror.DBErrorAdapt\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\
github.com/pingcap/dm/pkg/utils.GetMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\
github.com/pingcap/dm/syncer.(*UpStreamConn ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker ).StatusJSON\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Start\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server ).Start.func1\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:44] [“fail to get flushed global point”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error ).Delegate\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\
github.com/pingcap/dm/pkg/terror.DBErrorAdapt\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\
github.com/pingcap/dm/pkg/utils.GetMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\
github.com/pingcap/dm/syncer.(*UpStreamConn ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker ).StatusJSON\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Start\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server ).Start.func1\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:39] [“fail to get master status”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error ).Delegate\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\
github.com/pingcap/dm/pkg/terror.DBErrorAdapt\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\
github.com/pingcap/dm/pkg/utils.GetMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\
github.com/pingcap/dm/syncer.(*UpStreamConn ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker ).StatusJSON\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Start\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server ).Start.func1\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2020/09/04 05:13:26.173 +08:00] [WARN] [status.go:44] [“fail to get flushed global point”] [task=xxxx] [unit=“binlog replication”] [error="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection"] [errorVerbose="[code=10003:class=database:scope=not-set:level=high] database driver: invalid connection\ngithub.com/pingcap/dm/pkg/terror.(*Error ).Delegate\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/terror.go:267\ngithub.com/pingcap/dm/pkg/terror.DBErrorAdaptArgs\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:37\
github.com/pingcap/dm/pkg/terror.DBErrorAdapt\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/terror/adapter.go:46\
github.com/pingcap/dm/pkg/utils.GetMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/pkg/utils/db.go:147\
github.com/pingcap/dm/syncer.(*UpStreamConn ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/db.go:102\ngithub.com/pingcap/dm/syncer.(*Syncer ).getMasterStatus\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/syncer.go:642\ngithub.com/pingcap/dm/syncer.(*Syncer ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/syncer/status.go:37\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Status\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:104\ngithub.com/pingcap/dm/dm/worker.(*Worker ).StatusJSON\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/status.go:125\ngithub.com/pingcap/dm/dm/worker.(*Worker ).Start\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/worker.go:192\ngithub.com/pingcap/dm/dm/worker.(*Server ).Start.func1\
\t/home/jenkins/agent/workspace/build_dm_master/go/src/github.com/pingcap/dm/dm/worker/server.go:87\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"
sultan8252
(Sultan.Su@PingCAP)
2020 年9 月 4 日 10:05
14
也可以先尝试下 如下命令确认下 dm-worker 是否可以 正常返回
curl http://172.31.1.50:8262/status
curl http://172.31.1.50:8262/metrics
并请在 dm-ctl 中执行 query-status 并截图
查看需要停止的 task 是否存在.
如存在使用 query-status 查看 task 错误 并截图
谢谢
老司机1024
(.1024老司机)
2020 年9 月 7 日 01:25
15
status:
metrics:
log.log (99.6 KB)
dmctl:
yilong
(yi888long)
2020 年9 月 7 日 02:42
17
您好,我先请教一下,你这边是原mysql只更换了ip地址,还是主从集群中,从主节点更换到了从节点?
老司机1024
(.1024老司机)
2020 年9 月 7 日 03:03
18
您好,只是上游mysql数据库更换了ip地址,没有主从切换。
老司机1024
(.1024老司机)
2020 年9 月 7 日 03:24
20
DM worker里面的信息已经从master同步过来了,我确认里面是新的上游mysql地址,但是在我kill掉dm-work之后,重新运行dm-work时系统提示
。
老司机1024
(.1024老司机)
2020 年9 月 7 日 03:35
21
DM worker可以通过mysql客户端直接访问上游数据库,DM master和worker之间也可以免密ssh通信,端口telnet通。