TiUP關閉TiDB節點時failed to stop: timed out waiting for port 9100 to be stopped after 2m0s

Hi TiDB顧問們好,

在Scale-out一個TiDB節點10.210.1.117後,透過tiup cluster stop [cluseter_name],
在關閉node_exporter時候都會出現timeout狀況。

請問顧問們要調整哪個地方才能正常關閉集群?

Stopping component node_exporter
retry error: operation timed out after 2m0s
tidb 10.210.1.117:4000 failed to stop: timed out waiting for port 9100 to be stopped after 2m0s

Error: tidb 10.210.1.117:4000 failed to stop: timed out waiting for port 9100 to be stopped after 2m0s: timed out waiting for port 9100 to be stopped after 2m0s

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-11-16-13-36-26.log.
Error: run /home/tidb/.tiup/components/cluster/v1.2.3/tiup-cluster (wd:/home/tidb/.tiup/data/SGTmlrm) failed: exit status 1

查看10.210.1.117的log如下:
[2020/11/16 13:34:22.400 +08:00] [ERROR] [http_status.go:354] [“start status/rpc server error”] [error=“accept tcp [::]:10080: use of closed network connection”]
[2020/11/16 13:34:22.400 +08:00] [ERROR] [http_status.go:349] [“http server error”] [error=“http: Server closed”]
[2020/11/16 13:34:22.400 +08:00] [ERROR] [http_status.go:344] [“grpc server error”] []

可以在 asktug 搜索下关于 node_exporter 服务安装异样导致集群启停的修复方案

Hi 東北神顧問你好,

後來發現是我們MIS在預設裝機有啟用node_exporter docker服務,port衝突導致
time=“2020-11-13T18:20:47+08:00” level=fatal msg=“listen tcp :9100: bind: address already in use” source=“node_exporter.go:114”

我關閉docker服務後,集群即可正常啟動,關閉。
sudo docker stop bfc719790f2a
sudo docker rm bfc719790f2a

:+1::+1::+1:, 一般此类问题思路大都是先检查节点 log 看是否有详细信息, 在检查 message 看下是有信息, 在看下 dmesg -T |grep xxx 看是否有信息.

1 个赞