
想问问,我之前步骤提示一切顺利,但是每次在启动 TiDB 集群那一步就不行,会卡在TASK [wait until the PD port is up]一步和TASK [wait until the PD health page is available]这一步。

报错信息分别是fatal: []: FAILED! => {“changed”: false, “elapsed”: 300, “msg”: “the PD port 2379 is not up”}和FAILED - RETRYING: wait until the PD health page is available (12 retries left).


ERROR MESSAGE SUMMARY ***************************************************************************************************** []: Ansible FAILED! => playbook: start.yml; TASK: wait until the PD port is up; message: {“changed”: false, “elapsed”: 300, “msg”: “the PD port 2379 is not up”}

[]: Ansible FAILED! => playbook: start.yml; TASK: wait until the PD port is up; message: {“changed”: false, “elapsed”: 300, “msg”: “the PD port 2379 is not up”}

[]: Ansible FAILED! => playbook: start.yml; TASK: wait until the PD health page is available; message: {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “”}

请按提示填下必要信息。另外先从pd.log 开始排查起。

  • 系统版本 & kernel 版本 】centOS7
  • TiDB 版本

tidb_version = v2.1.8

  • 磁盘型号
  • 集群节点分布 】 [tidb_servers]











  • 数据量 & region 数量 & 副本数
  • 问题描述(我做了什么) 】 运行start.yml到启动pd-server这里的时候 PLAY [pd_servers]

TASK [start PD by supervise] **********************************************************************************************

TASK [start PD by systemd] ************************************************************************************************ changed: [] changed: [] changed: []

TASK [wait until the PD port is up] *************************************************************************************** ok: [] fatal: []: FAILED! => {“changed”: false, “elapsed”: 300, “msg”: “the PD port 2379 is not up”} fatal: []: FAILED! => {“changed”: false, “elapsed”: 300, “msg”: “the PD port 2379 is not up”}

TASK [wait until the PD health page is available] ************************************************************************* FAILED - RETRYING: wait until the PD health page is available (12 retries left). FAILED - RETRYING: wait until the PD health page is available (11 retries left). FAILED - RETRYING: wait until the PD health page is available (10 retries left). FAILED - RETRYING: wait until the PD health page is available (9 retries left). FAILED - RETRYING: wait until the PD health page is available (8 retries left). FAILED - RETRYING: wait until the PD health page is available (7 retries left). FAILED - RETRYING: wait until the PD health page is available (6 retries left). FAILED - RETRYING: wait until the PD health page is available (5 retries left). FAILED - RETRYING: wait until the PD health page is available (4 retries left). FAILED - RETRYING: wait until the PD health page is available (3 retries left). FAILED - RETRYING: wait until the PD health page is available (2 retries left). FAILED - RETRYING: wait until the PD health page is available (1 retries left). fatal: []: FAILED! => {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “”} to retry, use: --limit @/home/tidb/tidb-ansible/retry_files/start.retry

PLAY RECAP **************************************************************************************************************** : ok=19 changed=6 unreachable=0 failed=1 : ok=10 changed=3 unreachable=0 failed=1 : ok=10 changed=3 unreachable=0 failed=1 localhost : ok=1 changed=0 unreachable=0 failed=0

ERROR MESSAGE SUMMARY ***************************************************************************************************** []: Ansible FAILED! => playbook: start.yml; TASK: wait until the PD port is up; message: {“changed”: false, “elapsed”: 300, “msg”: “the PD port 2379 is not up”}

[]: Ansible FAILED! => playbook: start.yml; TASK: wait until the PD port is up; message: {“changed”: false, “elapsed”: 300, “msg”: “the PD port 2379 is not up”}

[]: Ansible FAILED! => playbook: start.yml; TASK: wait until the PD health page is available; message: {“attempts”: 12, “changed”: false, “content”: “”, “msg”: “Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>”, “redirected”: false, “status”: -1, “url”: “”}

  • 关键词

提供下 pd.log 的日志

pd.log (917.6 KB)

每个 PD SERVER 都有自己的 pd.log。把另外两个 pd server 的 log 也提供一下。

151机器的pd.log (15.5 KB) 152机器的pd.log (38.5 KB)


看了下 PD 报错日志,检查下是不是内外网 IP 的关系,这里有一个类似的问题可以参考看下:PD端口无法起来



此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。