扩容的PD节点报错,如何解决

【TiDB 使用环境】生产环境
【TiDB 版本】v6.1.0
【操作系统】centos7
【部署方式】机器部署
【集群数据量】
【集群节点数】

  • Generate config prometheus → 10.173.17.4:9090 … Error
  • Generate config grafana → 10.173.17.4:3000 … Error
  • Generate config alertmanager → 10.173.17.4:9093 … Error

Error: init config failed: 10.173.17.4:9090: transfer from /root/.tiup/storage/cluster/clusters/tidb-iap/config-cache/prometheus-10.173.17.4-9090.service to /tmp/prometheus_d21b6d81-b2f7-4d71-9ed7-60228725b874.service failed: failed to scp /root/.tiup/storage/cluster/clusters/tidb-iap/config-cache/prometheus-10.173.17.4-9090.service to tidb@10.173.17.4:/tmp/prometheus_d21b6d81-b2f7-4d71-9ed7-60228725b874.service: ssh: handshake failed: read tcp 10.173.17.4:59468->10.173.17.4:22: read: connection reset by peer

扩容新的PD节点,最后在在更新启动监控相关组件报错,查看集群状态PD已经扩容成功,已经修复了scp的问题,如何重新继续执行后面的流程?

前面失败了,想缩容这个PD节点在重新扩容有有了新的报错,执行命令 ./bin/tiup cluster scale-in tidb-iap --node 10.173.191.94:2379,现在这个节点是down的状态,但是下不掉了,怎么解决呢?
Stopping component pd
Stopping instance 10.173.191.94
Stop pd 10.173.191.94:2379 success
Destroying component pd
Destroying instance 10.173.191.94
Destroy 10.173.191.94 success

  • Destroy pd paths: [/home/data/tidb-deploy/pd-2379/log /home/data/tidb-deploy/pd-2379 /etc/systemd/system/pd-2379.service /home/data/tidb-data/pd-2379]
    Stopping component node_exporter
    Stopping instance 10.173.191.94

Error: failed to destroy: failed to stop monitor: failed to stop: 10.173.191.94 node_exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 2m0s

测试下 ssh 到目标机器,再看下目标机器的日志

开始ssh有问题吧,后面修复了,但是停止失败?直接scale-in 这个节点,不行就带–force,然后重新scale-out吧

1 个赞

scale-in –force强制缩容这个节点,再重新扩容scale-out,这样试一下是否可行

重新坐下ssh免密登陆重新扩缩容吧,最好防火墙端口都开下或者关闭。

从管理节点ssh-copy-id -i ~/.ssh/id_rsa.pub 10.173.17.4 的秘钥发送给主机10.173.17.4

互信解决了,结局之后缩容失败了

[root@10.173.17.4 ~]$ scp /root/.tiup/storage/cluster/clusters/tidb-iap/config-cache/prometheus-10.173.17.4-9090.service tidb@10.173.17.4:/tmp/prometheus_3756f0f7-4b8c-4830-b501-dc2de7372f27.service
prometheus-10.173.17.4-9090.service 100% 437 1.1MB/s 00:00

手动执行能成功,但是tiup命令执行报了这个错:
Error: init config failed: 10.173.17.4:9090: transfer from /root/.tiup/storage/cluster/clusters/tidb-iap/config-cache/prometheus-10.173.17.4-9090.service to /tmp/prometheus_3756f0f7-4b8c-4830-b501-dc2de7372f27.service failed: failed to scp /root/.tiup/storage/cluster/clusters/tidb-iap/config-cache/prometheus-10.173.17.4-9090.service to tidb@10.173.17.4:/tmp/prometheus_3756f0f7-4b8c-4830-b501-dc2de7372f27.service: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

你用tidb用户安装还是用root安装的,你手工执行的scp使用的root,但是命令里是tidb用户啊,你tidb用户之间做了免密吗?

扩容命令发出来看看。

SCP有问题,手动执行一下 SCP 10.173.17.4试试