TiDBit DM启动,worker 节点 fail to dial dm-master

【 TiDB 使用环境】测试
【 TiDB 版本】V7.4.0
【复现路径】tiup dm start dm-test 初次启动时出现
【遇到的问题:问题现象及影响】worker 节点 fail to dial dm-master,DM 集群启动报错
【资源配置】2核8G
【附件:截图/日志/监控】

日志1:
[2023/10/16 11:53:08.137 +08:00] [ERROR] [join.go:65] [“fail to dial dm-master”] [endpoint=http://43.138.205.213:8261] [error=“context deadline exceeded”]
[2023/10/16 11:53:08.137 +08:00] [INFO] [main.go:71] [“join the cluster meet error”] [error=“[code=40077:class=dm-worker:scope=internal:level=high], Message: cannot join with master endpoints: [http://43.138.205.213:8261], error: context deadline exceeded, Workaround: Please check network connection of worker and check worker name is unique.”] [errorVerbose=“[code=40077:class=dm-worker:scope=internal:level=high], Message: cannot join with master endpoints: [http://43.138.205.213:8261], error: context deadline exceeded, Workaround: Please check network connection of worker and check worker name is unique.\ngithub.com/pingcap/tiflow/dm/pkg/terror.(*Error).Generate\n\tgithub.com/pingcap/tiflow/dm/pkg/terror/terror.go:293\ngithub.com/pingcap/tiflow/dm/worker.(*Server).JoinMaster\n\tgithub.com/pingcap/tiflow/dm/worker/join.go:86\nmain.main\n\tgithub.com/pingcap/tiflow/cmd/dm-worker/main.go:69\nruntime.main\n\truntime/proc.go:267\nruntime.goexit\n\truntime/asm_amd64.s:1650”]

日志2:

2023-10-16T11:46:17.738+0800 DEBUG retry error {error: operation timed out after 2m0s}
2023-10-16T11:46:17.738+0800 DEBUG TaskFinish {task: StartCluster, error: failed to start dm-worker: failed to start: 43.138.205.213 dm-worker-8265.service, please check the instance’s log(/home/tidb/dm/deploy/dm-worker-8265/log) for more detail.: timed out waiting for port 8265 to be started after 2m0s, errorVerbose: timed out waiting for port 8265 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:123\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:157\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1650\nfailed to start: 43.138.205.213 dm-worker-8265.service, please check the instance’s log(/home/tidb/dm/deploy/dm-worker-8265/log) for more detail.\nfailed to start dm-worker}
2023-10-16T11:46:17.738+0800 INFO Execute command finished {code: 1, error: failed to start dm-worker: failed to start: 43.138.205.213 dm-worker-8265.service, please check the instance’s log(/home/tidb/dm/deploy/dm-worker-8265/log) for more detail.: timed out waiting for port 8265 to be started after 2m0s, errorVerbose: timed out waiting for port 8265 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:123\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:157\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1650\nfailed to start: 43.138.205.213 dm-worker-8265.service, please check the instance’s log(/home/tidb/dm/deploy/dm-worker-8265/log) for more detail.\nfailed to start dm-worker}

DM配置:

DM配置的上游能正常连接吗

tiup dm display dm-test看下master都没起来吗?

tiup dm start dm-test,DM 集群启动时报错的,还没有到连接上游步骤

tiup dm start dm-test 命令启动 DM 集群时,第 1 步 master 启动(success)了,第 2 步启动 worker 节点就报错了

所以执行 tiup dm display dm-test 命令时,master 状态为 DOWN

[code=40066:class=dm-worker:scope=internal:level=high] ExecuteDDL timeout, try use query-status to query whether the DDL is still blocking

https://docs.pingcap.com/zh/tidb-data-migration/v5.3/dm-error-handling#dm-错误系统

这是你的集群报的错误?

[error=“[code=40077:class=dm-worker:scope=internal:level=high], Message: cannot join with master endpoints: [http://43.138.205.213:8261], error: context deadline exceeded, Workaround: Please check network connection of worker and check worker name is unique.”]

我的错误码是 40077

你第一步成功了,master就应该起来了啊,但是你现在worker报错就是连接不到master,还是master没起来,看下master日志吧

1 个赞

43.138.205.213

你不会给dm绑了个公网地址吧?
那8621肯定是不能给你访问的。

1 个赞

43.138.205.213 是公网地址,应该是 8621 端口没开放,我试试

1 个赞

别别别,不能放在公网上,很不安全。 :joy:

而且公网流量要收费的,绑在内网地址上,不收费,也安全,岂不美哉?

问题解决了,公网端口未开放的问题

1 个赞

牛的,没想到直接用的公网地址

额。。。公网地址。。。 :call_me_hand: :call_me_hand: :call_me_hand:

感觉应该解决不用公网才是最好的

⑥翻了