TIDB4.0集群启动报错 pd 10.21.122.70:2379 failed to start: timed out waiting for port 2379 to be statred after 2m0s

2020-08-01T02:41:43.343-0700 INFO SSHCommand {“host”: “192.168.198.128”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \ LISTEN 0 128 0.0.0.0:111 0.0.0.0:* \ LISTEN 0 32 192.168.122.1:53 0.0.0.0:* \ LISTEN 0 128 0.0.0.0:22 0.0.0.0:* \ LISTEN 0 5 127.0.0.1:631 0.0.0.0:* \ LISTEN 0 128 [::]:111 [::]:* \ LISTEN 0 128 [::]:22 [::]:* \ LISTEN 0 5 [::1]:631 [::]:* \ ”, “stderr”: “”}
2020-08-01T02:41:43.343-0700 DEBUG retry error: operation timed out after 2m0s
2020-08-01T02:41:43.344-0700 ERROR pd 192.168.198.128:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance
2020-08-01T02:41:43.887-0700 INFO SSHCommand {“host”: “192.168.198.134”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \ LISTEN 0 128 0.0.0.0:111 0.0.0.0:* \ LISTEN 0 32 192.168.122.1:53 0.0.0.0:* \ LISTEN 0 128 0.0.0.0:22 0.0.0.0:* \ LISTEN 0 5 127.0.0.1:631 0.0.0.0:* \ LISTEN 0 128 [::]:111 [::]:* \ LISTEN 0 128 [::]:22 [::]:* \ LISTEN 0 5 [::1]:631 [::]:* \ ”, “stderr”: “”}
2020-08-01T02:41:43.888-0700 DEBUG retry error: operation timed out after 2m0s
2020-08-01T02:41:43.888-0700 ERROR pd 192.168.198.134:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance
2020-08-01T02:41:44.488-0700 INFO SSHCommand {“host”: “192.168.198.133”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/usr/bin:/usr/sbin ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \ LISTEN 0 128 0.0.0.0:111 0.0.0.0:* \ LISTEN 0 128 0.0.0.0:22 0.0.0.0:* \ LISTEN 0 5 127.0.0.1:631 0.0.0.0:* \ LISTEN 0 128 [::]:111 [::]:* \ LISTEN 0 128 [::]:22 [::]:* \ LISTEN 0 5 [::1]:631 [::]:* \ ”, “stderr”: “”}
2020-08-01T02:41:44.488-0700 DEBUG retry error: operation timed out after 2m0s
2020-08-01T02:41:44.489-0700 ERROR pd 192.168.198.133:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance
2020-08-01T02:41:44.489-0700 DEBUG TaskFinish {“task”: “ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false RetainDataRoles:[] RetainDataNodes:[]}”, “error”: “failed to start: failed to start pd: \tpd 192.168.198.128:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance: timed out waiting for port 2379 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 2379 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup@/pkg/cluster/module/wait_for.go:90\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup@/pkg/cluster/spec/instance.go:90\ github.com/pingcap/tiup/pkg/cluster/spec.(*instance).Ready\ \tgithub.com/pingcap/tiup@/pkg/cluster/spec/instance.go:121\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:466\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:502\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20190911185100-cd5d95a43a6e/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1357\ \tpd 192.168.198.128:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance\ failed to start pd\ failed to start”}
2020-08-01T02:41:44.489-0700 INFO Execute command finished {“code”: 1, “error”: “failed to start: failed to start pd: \tpd 192.168.198.128:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance: timed out waiting for port 2379 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 2379 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup@/pkg/cluster/module/wait_for.go:90\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup@/pkg/cluster/spec/instance.go:90\ github.com/pingcap/tiup/pkg/cluster/spec.(*instance).Ready\ \tgithub.com/pingcap/tiup@/pkg/cluster/spec/instance.go:121\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:466\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:502\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20190911185100-cd5d95a43a6e/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1357\ \tpd 192.168.198.128:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance\ failed to start pd\ failed to start”}

报错意思是 IP 地址为 192.168.198.128 的 PD server 的 2379 验证状态超时,可能的原因是因为 2379 端口被其他服务占用,导致端口冲突。另外一种可能就是 PD Server 服务启动失败,需要登陆到 192.168.198.128:2379 找到对应的 PD log 和 systemd 的 journal log 日志确认一下端口验证失败原因。

# 通过 netstat 检查 PD 2379 端口状态 
netstat -anp |grep 2379

# 或者通过 journal log 检查一下 PD 的 systemd 启动日志 
journal -u pd-2379.service 

# 检查 PD log
tail -n 1000 $deploy_dir/log/pd.log