集群systeminfo图表异常

再次启动时报的错

1 个赞

1 个赞

我是说这个检查了吗?我觉得 是 Grafanan -> prometheus-> node-exporter 反过来排除比较好。你的 prometheus 说没数据,但配置文件有检查吗?

1 个赞

您所说的是这个吗?
#vi prometheus.yml
- targets:
- ‘20.100.6.71:3000’
labels:
group: ‘grafana’
- targets:
- ‘20.100.6.71:9100’
- ‘20.100.6.72:9100’
- ‘20.100.6.73:9100’
labels:
group: ‘node_exporter’
- targets:
- ‘20.100.6.71:9115’
- ‘20.100.6.72:9115’
- ‘20.100.6.73:9115’
labels:
group: ‘blackbox_exporter’

1 个赞

对,是在异常节点上看的

看着都没问题,现在的现象是:node- exporter 端口及服务正常(日志无问题),prometheus 配置也没问题,但没数据。
只能排查一下权限或看一下什么时候开始没数据的了,看看那个时候发生了啥

是啊,服务啥的都正常

node-exporter 这个单独的 监控面板也是没数据的?

node-exporter集群中三台机器都有,但监控面板只有一个机器的数据

我这边也没思路了:joy:

现在还是没数据对吧,给我发我一下。node-expoter 日志吧,另外,你上面 再prometheus 上执行的表达式,是哪个监控指标的?

楼上的表达式是查询节点IO图的

[root@CRM-tidb1 log]# tail -n 20 blackbox_exporter.log
level=info ts=2021-08-20T03:28:54.055943387Z caller=main.go:213 msg=“Starting blackbox_exporter” version="(version=0.12.0, branch=HEAD, revision=4a22506cf0cf139d9b2f9cde099f0012d9fcabde)"
level=info ts=2021-08-20T03:28:54.056762951Z caller=main.go:220 msg=“Loaded config file”
level=info ts=2021-08-20T03:28:54.056958145Z caller=main.go:324 msg=“Listening on address” address=:9115
level=info ts=2021-09-30T02:39:46.805460608Z caller=main.go:213 msg=“Starting blackbox_exporter” version="(version=0.12.0, branch=HEAD, revision=4a22506cf0cf139d9b2f9cde099f0012d9fcabde)"
level=info ts=2021-09-30T02:39:46.806395794Z caller=main.go:220 msg=“Loaded config file”
level=info ts=2021-09-30T02:39:46.806535036Z caller=main.go:324 msg=“Listening on address” address=:9115
[root@CRM-tidb1 log]# tail -n 20 node_exporter.log
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”
time=“2021-09-30T10:41:32+08:00” level=error msg=“error encoding and sending metric family: write tcp 20.100.6.71:9100->20.100.6.72:45348: write: broken pipe\ ” source=“log.go:172”

9100连接了一个不存在的端口!!!
[root@CRM-tidb2 log]# tail -n 2 node_exporter.log
time=“2021-10-08T14:38:03+08:00” level=error msg="error encoding and sending metric family: write tcp 20.100.6.72:9100->20.100.6.72:43182: write: broken pipe
" source=“log.go:172”
2021/10/08 14:38:03 http: multiple response.WriteHeader calls
[root@CRM-tidb2 log]# netstat -an|grep 43182


访问url需要十秒http://20.100.6.72:9100/metrics,问题就在这里,明显是超时了,为什么这么慢呢?

1、上面的2个信息没用(日志和报错)
2、至于下面的 超时才是原因,但是原因,需要你看看网络(这个网络超时,只能看网络和 prometheus 日志找原因了)

本机到本机(ip72)也是超时的,网络的因素可以排除了

你手动执行哪个 API 可以看看响应时间是多少吗?

http://20.100.6.72:9100/metrics
在浏览器中10秒左右可以返回结果

你点进去看看,有没有报错?image

没发现报错结果 (490.7 KB)

https://blog.csdn.net/lyf0327/article/details/99971590 试试这个呢