send keepalive message fail报错

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【概述】
查看pd.log发现如下报错:
cat pd.log|grep “send keepalive message fail”
[2021/07/06 23:21:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=12] [error=EOF]
[2021/07/06 23:21:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/07/06 23:44:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/07/06 23:44:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=12] [error=EOF]
[2021/07/06 23:50:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374041] [error=EOF]
[2021/07/06 23:50:03.699 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374029] [error=EOF]
[2021/07/06 23:54:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=12] [error=EOF]
[2021/07/06 23:54:03.699 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/07/07 00:00:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374041] [error=EOF]
[2021/07/07 00:00:03.699 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374029] [error=EOF]
[2021/07/07 00:04:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/07/07 00:04:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=12] [error=EOF]
[2021/07/07 00:10:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374041] [error=EOF]
[2021/07/07 00:10:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374029] [error=EOF]
[2021/07/07 00:14:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/07/07 00:14:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=12] [error=EOF]
[2021/07/07 00:20:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374041] [error=EOF]
[2021/07/07 00:20:03.699 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=83374029] [error=EOF]
[2021/07/07 00:24:03.698 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=12] [error=EOF]

观察其他集群的pd.log也有类似报错。
[2021/06/22 03:44:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 03:44:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 03:54:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 03:54:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 04:04:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 04:04:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 04:14:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 04:14:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 04:24:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 04:24:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 04:34:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 04:34:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 04:44:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 04:44:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 04:54:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 04:54:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 05:04:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 05:04:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 05:14:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 05:14:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 05:24:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 05:24:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 05:34:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 05:34:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 05:44:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 05:44:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 05:54:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 05:54:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]
[2021/06/22 06:04:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=4] [error=EOF]
[2021/06/22 06:04:28.820 +08:00] [ERROR] [heartbeat_streams.go:122] [“send keepalive message fail”] [target-store-id=11] [error=EOF]

1、集群的业务量都非常小,网卡流量几乎没有,store到pd 的ping的速度也没有问题。

找到一个帖子:

请问这个有问题吗?

【背景】做过哪些操作

【现象】业务和数据库现象

【业务影响】

【TiDB 版本】

【附件】

  1. TiUP Cluster Display 信息

  2. TiUP Cluster Edit Config 信息

  3. TiDB- Overview 监控

  • 对应模块日志(包含问题前后1小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

5 Likes

TiDB日志有相关报错信息么,看相关贴中有根据其他组件日志分析出问题原因的。

进行过扩容缩容么?看下面帖子中,缩容后也出现过类似的错误

3 Likes

1、tikv server做过扩容操作,但是我查看我们另一套集群(没有做过扩缩容)也有相同的报错。
2、看了一下tidb.log没有发现相关报错。

这个报错是什么意思呢?

3 Likes

应该是心跳,用来检查各节点状态是否正常

HeartbeatStreams:生成的调度操作,最终需要发送给tikv节点,用于实施调度操作,从而达
到用户或者系统定义健康的期望。

tikv日志里有报错么?

3 Likes

tikv日志里面也没看到相关的ERROR报错

3 Likes

如果tidb和tikv都没有报错,且未扩容缩容集群也出现问题,那大概率可能是网络问题了。能够在目标服务器之间做一个持续ping,看看有没有丢包情况么?

3 Likes

分别是pd和tikv,pd和tidb吗?

3 Likes

pd和tikv之间有无丢包情况

3 Likes

环境是怎么部署的,display 看下集群拓扑

3 Likes

display.txt (8.0 KB)

请查收。

3 Likes

这个是目前的pd leader节点到所有节点的ping的情况

3 Likes

这么说吧,检查下是不是有 TiKV、PD 以及 TiDB 混合部署的情况,https://github.com/pingcap/tidb/issues/14240

3 Likes

pd和tidb是混合部署在一台服务器上的。
pd和tidb分别使用独立的SSD磁盘。

3 Likes

报错的 PD 日志是 PD leader 节点的吗

3 Likes

是的,是pd leader节点的日志。

3 Likes

目前我们ping的返回时间在 0.46ms ~0.64ms 之间, 这个错误的判断基准是什么呢, ping 返回时间高于多少时会报错?

3 Likes

两个问题:1、试着吧 PD 的 leader 节点单独拆出来部署试试 2、机器部署的配置是官网推荐的吗?

3 Likes

我们都已经上生产了,这个集群的配置也是按照官方的配置来进行的呀。现在让单独拆出来,有点难度呀。况且这个部署也是当时厂商的人来部署的。

你说的单独拆出来,是pd单独部署在一台服务器上?

3 Likes

况且从网络层面看,没有延迟和丢包。

1 Like

可以把 PD 特别是 PD 的 leader 拆出来单独部署在一台服务试下?

1 Like