TiDB 4.0.0 使用TiUP部署,PD启动失败

PD的日志文件是空的。
这是一个新集群。
PD的三个节点我部署在一台机器上,但是端口不同。

你好,

请上传下 topology 文件、debug 日志、报错 pd 日志文件。

debug日志在哪里? error日志没有

debug 日志为 tiup 当执行出现问题会打印的日志:在你得报错可以看下 .log 文件的位置输出。

pd 日志在 deploy dir/log 与其他部署方式相同,各个节点日志位置同理

DEBUG日志: 2020-06-09T17:27:03.517+0800 INFO Starting cluster tidb-ceph… 2020-06-09T17:27:03.518+0800 INFO + [ Serial ] - SSHKeySet: privateKey=/home/mysql/.tiup/storage/cluster/clusters/tidb-ceph/ssh/id_rsa, publicKey=/home/mysql/.tiup/storage/cluster/clusters/tidb-ceph/ssh/id_rsa.pub 2020-06-09T17:27:03.518+0800 DEBUG TaskBegin {“task”: “SSHKeySet: privateKey=/home/mysql/.tiup/storage/cluster/clusters/tidb-ceph/ssh/id_rsa, publicKey=/home/mysql/.tiup/storage/cluster/clusters/tidb-ceph/ssh/id_rsa.pub”} 2020-06-09T17:27:03.518+0800 DEBUG TaskFinish {“task”: “SSHKeySet: privateKey=/home/mysql/.tiup/storage/cluster/clusters/tidb-ceph/ssh/id_rsa, publicKey=/home/mysql/.tiup/storage/cluster/clusters/tidb-ceph/ssh/id_rsa.pub”} 2020-06-09T17:27:03.518+0800 DEBUG TaskBegin {“task”: “UserSSH: user=tidb, host=xxx.xxx.xx.52\ UserSSH: user=tidb, host=xxx.xx.xx.55\ UserSSH: user=tidb, host=xxx.xx.xx.12\ UserSSH: user=tidb, host=xxx.xxx.xx.41\ UserSSH: user=tidb, host=xxx.xxx.xx.47\ UserSSH: user=tidb, host=xxx.xxx.xx.38\ UserSSH: user=tidb, host=xxx.xxx.xx1.59\ UserSSH: user=tidb, host=xxx.xxx.xx.52\ UserSSH: user=tidb, host=xxx.xxx.xx.52\ UserSSH: user=tidb, host=xxx.xxx.xx.52”} 2020-06-09T17:27:03.518+0800 INFO + [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.52 2020-06-09T17:27:03.518+0800 INFO + [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.52 2020-06-09T17:27:03.518+0800 DEBUG TaskBegin {“task”: “UserSSH: user=tidb, host=xxx.xxx.xx.52”} 2020-06-09T17:27:03.518+0800 INFO + [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx1.55 2020-06-09T17:27:03.518+0800 DEBUG TaskFinish {“task”: “UserSSH: user=tidb, host=xxx.xxx.xx.52”} 2020-06-09T17:27:03.518+0800 DEBUG TaskBegin {“task”: “UserSSH: user=tidb, host=xxx.xxx.xx.52”} 2020-06-09T17:27:03.518+0800 DEBUG TaskBegin {“task”: “UserSSH: user=tidb, host=xxx.xxx.xx1.55”} 2020-06-09T17:27:03.518+0800 DEBUG TaskFinish {“task”: “UserSSH: user=tidb, host=xxx.xxx.xx.52”} 2020-06-09T17:27:03.518+0800 DEBUG TaskFinish {“task”: “UserSSH: user=tidb, host=xxx.xxx.xx1.55”} 2020-06-09T17:27:03.518+0800 INFO + [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.47 2020-06-09T17:27:03.518+0800 INFO + [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.41 …skipping… the instance 2020-06-09T17:28:04.728+0800 INFO SSHCommand {“host”: “xxx.xxx.xx1.55”, “port”: “22”, “cmd”: “PATH=$PATH:/usr/bin:/usr/sbin ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \ LISTEN 0 100 :9422 : \ LISTEN 0 128 127.0.0.1:1999 : \ LISTEN 0 128 xxx.xxx.xx1.55:1999 : \ LISTEN 0 128 :22 : \ LISTEN 0 128 127.0.0.1:8600 : \ LISTEN 0 100 127.0.0.1:59996 : \ LISTEN 0 128 :15998 : \ LISTEN 0 128 :15999 : \ LISTEN 0 100 127.0.0.1:41855 : \ LISTEN 0 128 127.0.0.1:1991 : \ LISTEN 0 128 :::23211 ::: \ LISTEN 0 128 :::6604 ::: \ LISTEN 0 128 :::31949 ::: \ LISTEN 0 128 :::29741 ::: \ LISTEN 0 128 :::24877 :::* \ LISTEN 0 128 :::8301 :::* \ LISTEN 0 128 :::25198 :::* \ LISTEN 0 128 :::20270 :::* \ LISTEN 0 128 :::24559 :::* \ LISTEN 0 128 :::23663 :::* \ LISTEN 0 128 :::20240 :::* \ LISTEN 0 128 :::6608 :::* \ LISTEN 0 128 :::26801 :::* \ LISTEN 0 128 :::28659 :::* \ LISTEN 0 128 :::8500 :::* \ LISTEN 0 128 :::25846 :::* \ LISTEN 0 128 :::22 :::* \ LISTEN 0 128 :::21496 :::* \ LISTEN 51 50 :::6620 :::* \ LISTEN 0 128 :::29053 :::* \ LISTEN 0 128 :::26240 :::* \ LISTEN 0 128 :::25152 :::* \ LISTEN 0 128 :::25185 :::* \ LISTEN 0 128 :::20930 :::* \ LISTEN 0 128 :::22851 :::* \ LISTEN 0 128 :::29028 :::* \ LISTEN 0 128 :::30916 :::* \ LISTEN 0 128 :::27909 :::* \ LISTEN 0 128 :::4646 :::* \ LISTEN 0 128 :::31783 :::* \ LISTEN 0 128 :::25577 :::* \ LISTEN 0 128 :::29865 :::* \ LISTEN 0 128 :::21481 :::* \ ”, “stderr”: “”} 2020-06-09T17:28:04.728+0800 DEBUG retry error: operation timed out after 1m0s 2020-06-09T17:28:04.728+0800 ERROR pd xxx.xxx.xx1.55:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance 2020-06-09T17:28:04.728+0800 DEBUG TaskFinish {“task”: “ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false SSHTimeout:5 OptTimeout:60 APITimeout:300}”, “error”: “failed to start: failed to start pd: \tpd xxx.xxx.xx.52:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance: timed out waiting for port 2379 to be started after 1m0s”, “errorVerbose”: “timed out waiting for port 2379 to be started after 1m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup@/pkg/cluster/module/wait_for.go:90\ github.com/pingcap/tiup/pkg/cluster/meta.PortStarted\ \tgithub.com/pingcap/tiup@/pkg/cluster/meta/logic.go:116\ github.com/pingcap/tiup/pkg/cluster/meta.(*instance).Ready\ \tgithub.com/pingcap/tiup@/pkg/cluster/meta/logic.go:146\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:468\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:504\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20190911185100-cd5d95a43a6e/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1357\ \tpd xxx.xxx.xx.52:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance\ failed to start pd\ failed to start”} 2020-06-09T17:28:04.728+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start: failed to start pd: \tpd xxx.xxx.xx.52:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance: timed out waiting for port 2379 to be started after 1m0s”, “errorVerbose”: “timed out waiting for port 2379 to be started after 1m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup@/pkg/cluster/module/wait_for.go:90\ github.com/pingcap/tiup/pkg/cluster/meta.PortStarted\ \tgithub.com/pingcap/tiup@/pkg/cluster/meta/logic.go:116\ github.com/pingcap/tiup/pkg/cluster/meta.(*instance).Ready\ \tgithub.com/pingcap/tiup@/pkg/cluster/meta/logic.go:146\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:468\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup@/pkg/cluster/operation/action.go:504\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20190911185100-cd5d95a43a6e/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1357\ \tpd xxx.xxx.xx.52:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance\ failed to start pd\ failed to start”}

请上传三个日志以附件的形式

都在服务器上,不方便下载下来topology.yaml (2.6 KB) debug.log (9.0 KB)

在同一个服务器上部署三个PD失败后,我使用了三台服务器来部署PD,仍然失败。

上传下 pd.log ,在 deploy dir / log 中,debug 日志中没有明显报错,拓扑文件中,全局变量 deploy_dir 和 data_dir 与节点实际部署位置不同,不影响启动

没有日志

un 9 19:03:21 bjfk-staging-ls418 run_pd.sh: [2020/06/09 19:03:21.863 +08:00] [FATAL] [main.go:56] [“parse cmd flags error”] [error=“log directory shouldn’t be the subdirectory of data directory”] [errorVerbose=“log directory shouldn’t be the subdirectory of data directory\ngithub.com/pingcap/pd/v4/server/config.(*Config).Validate\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.0/go/src/github.com/pingcap/pd/server/config/config.go:330\ngithub.com/pingcap/pd/v4/server/config.(*Config).Adjust\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.0/go/src/github.com/pingcap/pd/server/config/config.go:396\ngithub.com/pingcap/pd/v4/server/config.(*Config).Parse\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.0/go/src/github.com/pingcap/pd/server/config/config.go:308\ main.main\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.0/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:42\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357”] [stack=“github.com/pingcap/log.Fatal\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.0/go/pkg/mod/github.com/pingcap/log@v0.0.0-20200117041106-d28c14d3b1cd/global.go:59\ main.main\ \t/home/jenkins/agent/workspace/build_pd_multi_branch_v4.0.0/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:56\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203”] Jun 9 19:03:21 bjfk-staging-ls418 systemd: pd-2379.service: main process exited, code=exited, status=1/FAILURE

应该是log和data目录的问题

这里提示的意思是,不要把log目录设置为 data目录的子目录,重新配置下目录位置,多谢。可以参考配置文件

https://github.com/pingcap/docs-cn/blob/release-4.0/config-templates/complex-mini.yaml

嗯,建议把这个检查放在TiUP中,PD的日志都没有这些信息,这个是在系统的message中查找到的,比较隐晦。

感谢反馈

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。

别装这么低版本了啊。都是过官方维护版本范围了快。