监控服务器在进行vmotion之后无法获取数据

Hacker_Lc1rnkne · 2020 年2 月 28 日 06:50

为提高效率，提问时请提供以下信息，问题描述清晰可优先响应。

【TiDB 版本】：3.0.5
【问题描述】：监控使用vm，在经过vmotion后发现无监控数据

发现日期日期异常，通过ntp修改日期，然后通过ansible-playbook rolling_update_monitor.yml 方法重启monitor依然无法获取数据。网络无异常

监控服务器上Prometheus 日志

level=warn ts=2020-02-28T06:15:34.885101831Z caller=scrape.go:1126 component=“scrape manager” scrape_pool=blackbox_exporter_xxx.xxx.xxx.xxx_icmp target=“http://xxx.xxx.xxx.xxx:9115/probe?module=icmp&target=xxx.xxx.xxx.xxx” msg=“Error on ingesting samples that are too old or are too far into the future” num_dropped=4 level=warn ts=2020-02-28T06:15:34.885154284Z caller=scrape.go:882 component=“scrape manager” scrape_pool=blackbox_exporter_xxx.xxx.xxx.xxx_icmp target=“http://xxx.xxx.xxx.xxx:9115/probe?module=icmp&target=xxx.xxx.xxx.xxx” msg=“appending scrape report failed” err=“out of bounds”

日志上的ts=2020-02-28T06:15:34.885154284Z 与当前时间不一致，提前了8个小时

若提问为性能优化、故障排查类问题，请下载脚本运行。终端输出的打印结果，请务必全选并复制粘贴上传。

zhenjiaogao · 2020 年2 月 28 日 07:36

1、确认下操作的顺序：

1）迁移 VM

2）在 Prometheus 正常运行的时候，通过 ntp 更新了系统时间，错误的时间 + 8 设置为正确的时间

3）Prometheus 出现上述报错

2、如果是上述的操作顺序，建议看下 github 上 Prometheus 相关内容的 issues：

https://github.com/prometheus/prometheus/issues/6554

Hacker_Lc1rnkne · 2020 年3 月 2 日 01:05

问题已经解决，是时间跳变导致的，猜测可能与Prometheus 机制有关。猜测如下，vmotion后操作系统时间发生了向后跳变，vmotion从跳变时间开始取数，grafna当前当前时间显示无数据，修改操作系统时间后与Prometheus最后取数时间发生冲突，报“Error on ingesting samples that are too old or are too far into the future”无法取数。解决方法：清空prometheus2.0.0.data.metrics 文件后重启prometheus 数据正常。

飞与非-PingCAP · 2020 年3 月 2 日 02:28

赞，感谢你的提问。

system · 2022 年10 月 31 日 19:10

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。