tikv scale-in 后处于 Tombstone 因机器无法通信,导致无法成功执行 prune 命令

背景: 机器 37 故障,无法通过 ssh 建立连接,因此需要将 tikv 节点下线

执行完 scale-in 命令后,tikv 节点显示为 Tombstone ,

当执行 tiup cluster prune cluster_name 时,报以下错误,这个错误也是因为 ssh 无法连上的原因

Error: failed to destroy tombstone: failed to stop tikv: failed to stop: 10.20.70.37 tikv-20160.service, please check the instance's log(/ssd/tidb-deploy/tikv-20160/log) for more detail.:executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.20.70.37:22' {ssh_stderr: , ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin/usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl stop tikv-20160.service"}, cause: ssh: handshake failed: read tcp 10.20.70.39:33600->10.20.70.37:22: read: connection reset by peer

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2022-06-29-01-32-17.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.7.0/tiup-cluster` (wd:/home/tidb/.tiup/data/TA6dw59) failed: exit status 1

除了 prune 命令后,还有哪种方法可以将处于 Tombstone 状态的 tikv 节点下线呢?

试一下这个呢
https://docs.pingcap.com/zh/tidb/stable/pd-control#store-delete--cancel-delete--label--weight--remove-tombstone--limit--store_id---jqquery-string

pd-ctl store remove-tombstone

tiup cluster prune --force
然后 pd-ctl store remove-tombstone

1 个赞

可以了,感谢

我也遇到过,加force就好了。:smile:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。