节点通过tiup正常缩容后Prometheus依然告警问题

TiDBer_E5BPfmPA · 2023 年7 月 12 日 07:53

【 TiDB 使用环境】测试
【 TiDB 版本】 v6.4.0
【遇到的问题：问题现象及影响】
通过tiup完成一套新集群部署后（新集群包含altermanager Prometheus等），通过tiup完成tidb实例扩缩容后，Prometheus依然会报缩容节点down机的告警

【附件：截图】

Rilakkuma · 2023 年7 月 17 日 02:03

看下模版文件 .tiup/storage/cluster/clusters/${clustername}/config-cache/run_prometheus_{ip}_{port}.sh 里是不是还包含已经缩容掉的节点信息，如果还有就删掉，然后重启 prom

buptzhoutian · 2023 年7 月 17 日 03:43

当时缩容操作的 audit log 还在吧？贴一下看看，tiup 应该会把 prometheus targets 更新一遍的
tiup cluster audit <ID>

redgame · 2023 年7 月 17 日 08:14

检查 Prometheus 的配置文件，确认是否正确地添加了新添加的 TiDB 实例

Raymond · 2023 年7 月 17 日 13:25

我之前遇到过缩容集群节点后，碰到node_exporter的告警，你reload下prometheus 应该就好了

Jellybean · 2023 年7 月 18 日 05:10

在执行扩缩容的过程有没有中断？按理说tiup 也会自动更新监控组件的