扩容pd

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.0-alpha-1292-gb1c08ee27
  • 【问题描述】: TASK [wait until the PD port is up] ***************************************************************************** fatal: [10.105.1.165]: FAILED! => changed=false elapsed: 300 msg: the PD port 2379 is not up fatal: [10.105.1.166]: FAILED! => changed=false elapsed: 300 msg: the PD port 2379 is not up to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/start.retry

PLAY RECAP ****************************************************************************************************** 10.105.1.165 : ok=10 changed=0 unreachable=0 failed=1
10.105.1.166 : ok=10 changed=2 unreachable=0 failed=1

ERROR MESSAGE SUMMARY ******************************************************************************************* [10.105.1.165]: Ansible Failed! ==> changed=false elapsed: 300 msg: the PD port 2379 is not up

[10.105.1.166]: Ansible Failed! ==> changed=false elapsed: 300 msg: the PD port 2379 is not up

Ask TiDB User Group for help: It seems that you have encountered some problem. Please describe your operation steps and provide error information as much as possible on https://asktug.com (in Chinese) or https://stackoverflow.com/questions/tagged/tidb (in English). We will do our best to help solve your problem. Thanks. :slight_smile:

系统是redhat7.5 x64

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

请确认:

  1. 扩容步骤是否按照以下步骤:https://pingcap.com/docs-cn/stable/how-to/scale/with-ansible/#扩容-pd-节点
  2. 在 10.105.1.166 的机器上面执行 {deploy_dir}/scripts/run_pd.sh 看能否正常启动。
  3. 确认 {deploy_dir}/log/pd.log 存在异常。

我就跟着官方文档做的。

建议按照上述步骤检查下,如果有问题,可以看下日志有什么具体信息。

[2020/01/07 15:16:12.519 +08:00] [ERROR] [join.go:213] [“failed to open directory”] [error=“open /data/tidb/deploy/data.pd/member: no such file or directory”] [2020/01/07 15:16:12.525 +08:00] [FATAL] [main.go:92] [“join meet error”] [error=“there is a member that has not joined successfully”] [stack=“github.com/pingcap/log.Fatal \n\t/home/jenkins/agent/workspace/build_pd_multi_branch_master/go/pkg/mod/github.com/pingcap/log@v0.0.0-20191012051959-b742a5d432e9/global.go:59\nmain.main\n\t/home/jenkin s/agent/workspace/build_pd_multi_branch_master/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:92\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203”]

使用 pd-ctl 看下当前节点的状态,使用 health 查看,如果有状态为 false 的节点,则需要先将其删除掉才能进行扩容 PD 的操作。删掉 data.pd 目录,然后重新拉起 start_pd.sh 试一下

扩容的pd,2379端启动不了
member 目录也创建不了

[ERROR] [join.go:213] [“failed to open directory”] [error=“open /data/tidb/deploy/data.pd/member: no such file or directory”]

您好,请按照上面的描述检查下各个 pd 的状态,然后按照操作重新 start_pd.sh 操作。

目录创建不了,检查下是否用对应的权限。

pd-ctl看了都是true状态,目录的权限是tidb用户组的。

是否重新操作过 ? 麻烦按照提供的建议操作下,如果不是这原因,就可以从其他方面排查,也高效一些。多谢。