执行销毁集群后,到停止node_exporter时,总是提示超时退出,无法销毁tidb集群

【 TiDB 使用环境】生产环境 or 测试环境 or POC
【 TiDB 版本】v.5.4.0
【遇到的问题】执行销毁集群后,到停止node_exporter时,总是提示超时退出。

tiup cluster destroy tidb-test

tiup is checking updates for component cluster …
Starting component cluster: /root/.tiup/components/cluster/v1.9.0/tiup-cluster /root/.tiup/components/cluster/v1.9.0/tiup-cluster destroy tidb-test

██ ██ █████ ██████ ███ ██ ██ ███ ██ ██████
██ ██ ██ ██ ██ ██ ████ ██ ██ ████ ██ ██
██ █ ██ ███████ ██████ ██ ██ ██ ██ ██ ██ ██ ██ ███
██ ███ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██ ██
███ ███ ██ ██ ██ ██ ██ ████ ██ ██ ████ ██████

This operation will destroy tidb v5.4.0 cluster pro-sunac-tidb and its data.
Are you sure to continue?
(Type “Yes, I know my cluster and data will be deleted.” to continue)
: Yes, I know my cluster and data will be deleted.
Destroying cluster…

  • [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/pro-sunac-tidb/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/pro-sunac-tidb/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.204
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.200
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.198
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.196
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.196
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.196
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.198
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.198
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.202
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.204
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.194
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.200
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.202
  • [Parallel] - UserSSH: user=tidb, host=10.3.8.194
  • [ Serial ] - StopCluster
    Stopping component alertmanager
    Stopping instance 10.3.8.194
    Stop alertmanager 10.3.8.194:9093 success
    Stopping component grafana
    Stopping instance 10.3.8.194
    Stop grafana 10.3.8.194:3000 success
    Stopping component prometheus
    Stopping instance 10.3.8.204
    Stopping instance 10.3.8.196
    Stopping instance 10.3.8.200
    Stopping instance 10.3.8.198
    Stopping instance 10.3.8.202
    Stop prometheus 10.3.8.202:9090 success
    Stop prometheus 10.3.8.196:9090 success
    Stop prometheus 10.3.8.200:9090 success
    Stop prometheus 10.3.8.204:9090 success
    Stop prometheus 10.3.8.198:9090 success
    Stopping component tidb
    Stopping instance 10.3.8.198
    Stopping instance 10.3.8.196
    Stop tidb 10.3.8.196:4000 success
    Stop tidb 10.3.8.198:4000 success
    Stopping component tikv
    Stopping instance 10.3.8.204
    Stopping instance 10.3.8.202
    Stopping instance 10.3.8.200
    Stop tikv 10.3.8.204:20160 success
    Stop tikv 10.3.8.200:20160 success
    Stop tikv 10.3.8.202:20160 success
    Stopping component pd
    Stopping instance 10.3.8.198
    Stopping instance 10.3.8.196
    Stop pd 10.3.8.198:2379 success
    Stop pd 10.3.8.196:2379 success
    Stopping component node_exporter
    Stopping instance 10.3.8.194
    Stopping instance 10.3.8.196
    Stopping instance 10.3.8.198
    Stopping instance 10.3.8.202
    Stopping instance 10.3.8.200
    Stopping instance 10.3.8.204

Error: failed to stop: 10.3.8.196 node_exporter-9100.service, please check the instance’s log() for more detail.: timed out waiting for port 9100 to be stopped after 1m0s

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2022-04-11-16-49-51.log.

通过查看debug日志,发现有如下报错信息:
2022-04-11T16:49:51.553+0800 DEBUG retry error {“error”: “operation timed out after 1m0s”}

【复现路径】做过哪些操作出现的问题
【问题现象及影响】

超时后,就直接退出,导致无法销毁tidb集群。

【附件】

去那台机器ps看下 exporter几个进程

ps -ef|grep exporter

root 2172 17722 0 18:18 pts/0 00:00:00 grep --color=auto exporter
tidb 6174 1 0 15:36 ? 00:01:18 bin/blackbox_exporter/blackbox_exporter --web.listen-address=:9115 --log.level=info --config.file=conf/blackbox.yml
tidb 6175 6174 0 15:36 ? 00:00:00 /bin/bash /acdata/tidb-cluster/tidb-deploy/monitor-9100/scripts/run_blackbox_exporter.sh
tidb 6176 6175 0 15:36 ? 00:00:00 tee -i -a /acdata/tidb-cluster/tidb-deploy/monitor-9100/log/blackbox_exporter.log
root 13856 1 0 Mar30 ? 02:20:43 /acdata/exporter/mysqld_exporter/mysqld_exporter --web.listen-address=0.0.0.0:9104 --config.my-cnf=/acdata/exporter/mysqld_exporter/my_prom.cnf --log.level=error --collect.info_schema.processlist --collect.info_schema.innodb_metrics --collect.info_schema.innodb_tablespaces --collect.info_schema.innodb_cmp --collect.info_schema.innodb_cmpmem

和root启动的冲突,先临时把root起的停下,或者手动停止tidb各组件和删除相应文件

手动停止tidb这些服务

把9100端口应用kill掉。