tidb磁盘满了,关闭集群报错,具体错误如下

[10.10.24.61]: Ansible UNREACHABLE! => changed=False
playbook: stop.yml
TASK: check_config_dynamic : environment check (deploy dir)
stderr: Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in “/tmp”. Failed command was: ( umask 77 && mkdir -p “echo ~/.ansible/tmp/ansible-tmp-1601175065.52-44768727857955” && echo ansible-tmp-1601175065.52-44768727857955="echo ~/.ansible/tmp/ansible-tmp-1601175065.52-44768727857955" ), exited with result 1

尝试将磁盘满了的 tikv 节点删除一些文件来让 tidb-ansible 有空间进行操作。

  1. 可以看下 tidb-deploy-dir/log/* 看下会否可以清理
  2. rocksdb 的日志位于 {deploy_dir}/data下的 raft 和 db 目录下,文件名为 LOG 和 LOG.old.xxx; tikv 的日志目前有 log-rotation-timespan,默认 24h 切换一次,对于历史日志需要通过定时任务清理下。

清理了日志还是没用,还有其它办法吗

没有用指的是什么意思

df -h 看下 tikv 的磁盘空间。

ps 确认下 tikv-server 是否存活。

可以尝试手动执行 systemctl stop tikv-xxx.server 或者 kill。

就是现在空间已经加上了,没有空间的问题已得到解决,但是之前的问题还是存在,具体如下:

[10.10.24.66]: Ansible UNREACHABLE! => changed=False
playbook: stop.yml
TASK: check_config_dynamic : environment check (deploy dir)
stderr: Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in “/tmp”. Failed command was: ( umask 77 && mkdir -p “echo ~/.ansible/tmp/ansible-tmp-1601195337.01-185794460975888” && echo ansible-tmp-1601195337.01-185794460975888="echo ~/.ansible/tmp/ansible-tmp-1601195337.01-185794460975888" ), exited with result 1

[10.10.24.68]: Ansible UNREACHABLE! => changed=False
playbook: stop.yml
TASK: check_config_dynamic : environment check (deploy dir)
stderr: Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in “/tmp”. Failed command was: ( umask 77 && mkdir -p “echo ~/.ansible/tmp/ansible-tmp-1601195337.06-81529809666726” && echo ansible-tmp-1601195337.06-81529809666726="echo ~/.ansible/tmp/ansible-tmp-1601195337.06-81529809666726" ), exited with result 1

[10.10.24.67]: Ansible UNREACHABLE! => changed=False
playbook: stop.yml
TASK: check_config_dynamic : environment check (deploy dir)
stderr: Authentication or permission failure. In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in “/tmp”. Failed command was: ( umask 77 && mkdir -p “echo ~/.ansible/tmp/ansible-tmp-1601195337.07-212872874241640” && echo ansible-tmp-1601195337.07-212872874241640="echo ~/.ansible/tmp/ansible-tmp-1601195337.07-212872874241640" ), exited with result 1

手动执行下看是否成功

umask 77 && mkdir -p “ echo ~/.ansible/tmp/ansible-tmp-1601195337.07-212872874241640 ” && echo ansible-tmp-1601195337.07-212872874241640=" echo ~/.ansible/tmp/ansible-tmp-1601195337.07-212872874241640 "
确定下 /tmp 目录的空间是否充足。

在对应机器上执行吗?还是说在中控机上连接到对应机器执行?执行的话,切换到tidb用户执行吗?

中控机 tidb 用户执行上面语句,模拟 tidb-ansible 执行命令。

[tidb@tidb-61 tidb-ansible]$ umask 77 && mkdir -p “echo ~/.ansible/tmp/ansible-tmp-1601195337.01-185794460975888” && echo ansible-tmp-1601195337.01-185794460975888="echo/.ansible/tmp/ansible-tmp-1601195337.01-185794460975888

命令进去之后,没法操作了

命令进去之后,没法操作了

linux 命令,看下引号是否闭合,

加了引号后能正常执行,但是没有任何的输出。

接下来继续启动PD还是会报错:

您好 根据您给出的截图,为在远端服务器创建目录权限不足

请您确认
10.10.24.66 这台服务器的 部署账号的免密登录是否正常 并且是否有 相应 home 目录 ,以及 home 目录是否有写权限。

也可用如下命令进行检查
ansible -i inventory.ini all -m shell -a ‘whoami’
ansible -i inventory.ini all -m shell -a ‘whoami’ -b

如以上检查存在问题请按照官方文档进行 逐步检查
https://docs.pingcap.com/zh/tidb/v3.0/online-deployment-using-ansible#使用-tidb-ansible-部署-tidb-集群

PS. tidb-ansiable 已被 TiUP 逐步取代,推荐可能情况下尽快转换使用 tiup 进行机器管理与维护

谢谢,确实是ssh失效了,已经解决了问题

:+1: