停电后tidb启动不了

【TiDB 版本】
tidb-ansible-3.1.0

【问题描述】
新接手的一个项目,据说是停掉后 业务异常,然后手动启tidb启不来。
启动及报错信息如下:

[tidb@b8 tidb-ansible-3.1.0]$ sudo ansible-playbook start.yml

PLAY [check config locally] **************************************************************************************************************************************************

PLAY [gather all facts, and check dest] **************************************************************************************************************************************

TASK [check_config_dynamic : Set enable_binlog variable] *********************************************************************************************************************

TASK [check_config_dynamic : Set deploy_dir if not set] **********************************************************************************************************************

TASK [check_config_dynamic : environment check (deploy dir)] *****************************************************************************************************************
ok: [192.168.10.106]
ok: [192.168.10.109]
ok: [192.168.10.107]
ok: [192.168.10.102]
ok: [192.168.10.101]
ok: [192.168.10.108]

TASK [check_config_dynamic : Preflight check - Does deploy dir have appropriate permission] **********************************************************************************

TASK [check_config_dynamic : environment check (supervise)] ******************************************************************************************************************

TASK [check_config_dynamic : config skip variables (default)] ****************************************************************************************************************
ok: [192.168.10.101]
ok: [192.168.10.102]
ok: [192.168.10.106]
ok: [192.168.10.109]
ok: [192.168.10.107]
ok: [192.168.10.108]

TASK [check_config_dynamic : config skip variables] **************************************************************************************************************************
ok: [192.168.10.101]
ok: [192.168.10.102]
ok: [192.168.10.106]
ok: [192.168.10.109]
ok: [192.168.10.107]
ok: [192.168.10.108]

TASK [check_config_dynamic : config skip variables] **************************************************************************************************************************

PLAY [monitored_servers] *****************************************************************************************************************************************************

PLAY [monitored_servers] *****************************************************************************************************************************************************

PLAY [alertmanager_servers] **************************************************************************************************************************************************

PLAY [monitoring_servers] ****************************************************************************************************************************************************

PLAY [monitoring_servers] ****************************************************************************************************************************************************

PLAY [kafka_exporter_servers] ************************************************************************************************************************************************
skipping: no hosts matched

PLAY [pd_servers] ************************************************************************************************************************************************************

TASK [start PD by supervise] *************************************************************************************************************************************************

TASK [start PD by systemd] ***************************************************************************************************************************************************
ok: [192.168.10.106]
ok: [192.168.10.102]
ok: [192.168.10.101]

TASK [wait until the PD port is up] ******************************************************************************************************************************************
ok: [192.168.10.106]
ok: [192.168.10.102]
ok: [192.168.10.101]

TASK [wait until the PD health page is available] ****************************************************************************************************************************
ok: [192.168.10.106]
ok: [192.168.10.102]
ok: [192.168.10.101]

TASK [wait until the PD health page is available when enable_tls] ************************************************************************************************************

PLAY [tikv_servers] **********************************************************************************************************************************************************

TASK [start TiKV by supervise] ***********************************************************************************************************************************************

TASK [start TiKV by systemd] *************************************************************************************************************************************************
ok: [192.168.10.108]
ok: [192.168.10.109]
ok: [192.168.10.107]

TASK [wait until the TiKV port is up] ****************************************************************************************************************************************
ok: [192.168.10.107]
ok: [192.168.10.108]
ok: [192.168.10.109]

TASK [wait until the TiKV status page is available] **************************************************************************************************************************
ok: [192.168.10.107]
ok: [192.168.10.108]
ok: [192.168.10.109]

TASK [wait until the TiKV status page is available when enable_tls] **********************************************************************************************************

TASK [command] ***************************************************************************************************************************************************************
ok: [192.168.10.108]
ok: [192.168.10.107]
ok: [192.168.10.109]

TASK [display new tikv pid] **************************************************************************************************************************************************
ok: [192.168.10.107] => {
“msg”: “tikv binary or docker pid: 28832”
}
ok: [192.168.10.108] => {
“msg”: “tikv binary or docker pid: 7055”
}
ok: [192.168.10.109] => {
“msg”: “tikv binary or docker pid: 31609”
}

PLAY [pd_servers[0]] *********************************************************************************************************************************************************

TASK [wait for region replication complete] **********************************************************************************************************************************
FAILED - RETRYING: wait for region replication complete (20 retries left).
FAILED - RETRYING: wait for region replication complete (19 retries left).
FAILED - RETRYING: wait for region replication complete (18 retries left).
FAILED - RETRYING: wait for region replication complete (17 retries left).
FAILED - RETRYING: wait for region replication complete (16 retries left).
FAILED - RETRYING: wait for region replication complete (15 retries left).
FAILED - RETRYING: wait for region replication complete (14 retries left).
FAILED - RETRYING: wait for region replication complete (13 retries left).
FAILED - RETRYING: wait for region replication complete (12 retries left).
FAILED - RETRYING: wait for region replication complete (11 retries left).
FAILED - RETRYING: wait for region replication complete (10 retries left).
FAILED - RETRYING: wait for region replication complete (9 retries left).
FAILED - RETRYING: wait for region replication complete (8 retries left).
FAILED - RETRYING: wait for region replication complete (7 retries left).
FAILED - RETRYING: wait for region replication complete (6 retries left).
FAILED - RETRYING: wait for region replication complete (5 retries left).
FAILED - RETRYING: wait for region replication complete (4 retries left).
FAILED - RETRYING: wait for region replication complete (3 retries left).
FAILED - RETRYING: wait for region replication complete (2 retries left).
FAILED - RETRYING: wait for region replication complete (1 retries left).
fatal: [192.168.10.101]: FAILED! => {“attempts”: 20, “changed”: false, “msg”: “Unsupported parameters for (ansible.legacy.uri) module: return content Supported parameters include: attributes, body, body_format, client_cert, client_key, creates, dest, follow_redirects, force, force_basic_auth, group, headers, http_agent, method, mode, owner, remote_src, removes, return_content, selevel, serole, setype, seuser, src, status_code, timeout, unix_socket, unsafe_writes, url, url_password, url_username, use_proxy, validate_certs”}

PLAY RECAP *******************************************************************************************************************************************************************
192.168.10.101 : ok=6 changed=0 unreachable=0 failed=1 skipped=7 rescued=0 ignored=0
192.168.10.102 : ok=6 changed=0 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0
192.168.10.106 : ok=6 changed=0 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0
192.168.10.107 : ok=8 changed=0 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0
192.168.10.108 : ok=8 changed=0 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0
192.168.10.109 : ok=8 changed=0 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0

ERROR MESSAGE SUMMARY ********************************************************************************************************************************************************
[192.168.10.101]: Ansible FAILED! => playbook: start.yml; TASK: wait for region replication complete; message: {“attempts”: 20, “changed”: false, “msg”: “Unsupported parameters for (ansible.legacy.uri) module: return content Supported parameters include: attributes, body, body_format, client_cert, client_key, creates, dest, follow_redirects, force, force_basic_auth, group, headers, http_agent, method, mode, owner, remote_src, removes, return_content, selevel, serole, setype, seuser, src, status_code, timeout, unix_socket, unsafe_writes, url, url_password, url_username, use_proxy, validate_certs”}
Ask for help:
Contact us: support@pingcap.com
[WARNING]: Failure using method (v2_playbook_on_stats) in callback plugin (<ansible.plugins.callback.help.CallbackModule object at 0x7f0f6a61ee50>): Invalid color supplied
to display: white
[tidb@b8 tidb-ansible-3.1.0]$

可以先参考下这个帖子:

按照文档里说的先deploy,我执行deploy又报错了。

看看参数,根据报错,应该是包含了一些不支持的参数。如果最早启动过,就问问之前部署的人,原始文件是什么? 如果这个集群没有启动过,建议用tiup安装4.0版本吧,不要使用 3.1版本了。3.1 不会再发版更新。

FAILED! => {“attempts”: 20, “changed”: false, “msg”: “Unsupported parameters for (ansible.legacy.uri) module: return content Supported parameters include: attributes, body, body_format, client_cert, client_key, creates, dest, follow_redirects, force, force_basic_auth, group, headers, http_agent, method, mode, owner, remote_src, removes, return_content, selevel, serole, setype, seuser, src, status_code, timeout, unix_socket, unsafe_writes, url, url_password, url_username, use_proxy, validate_certs”}

请问下:如果我现在用tiup重新安装4.x新版本的话,安装哪个版本呢?List出来的所有版本都是 stable版本吗?

2,环境中还有老版本组件的进程存在,我可以直接覆盖安装吗? 还是需要全部杀进程后再安装?

  1. 是的,你看到v4.0.x 的都是稳定版本。 rc的不是
  2. 老的进程最好手工清理干净,还有目录。如果要重建的话。

好的,己经重装完成,很顺利! thanks

:+1:

1 个赞

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。