【TiDB 使用环境】
生产环境
【TiDB 版本】
v7.1.3
【操作系统】
open欧拉
【部署方式】
私有化部署,物理机部署 SSD磁盘
【集群数据量】
1T
【集群节点数】
1个 monitor 6个tidb 6个pd 8个tikv
【问题复现路径】
没做过任何操作
【遇到的问题:问题现象及影响】
tikv节点会先disconnected 然后down,登录tikv节点的机器,把tikv服务重启一下就会恢复,过几个小时又会出现
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【复制黏贴 ERROR 报错的日志】
[2025/05/21 05:49:51.444 +08:00] [ERROR] [kv.rs:753] [“KvService::batch_raft send response fail”] [err=RemoteStopped]
[2025/05/21 05:49:57.298 +08:00] [ERROR] [raft_client.rs:581] [“connection aborted”] [addr=172.29.156.16:20161] [receiver_err=“Some(RpcFailure(RpcStatus { code: 14-UNAVAILABLE, message: "keepalive watchdog timeout", details: }))”] [sink_error=“Some(RpcFinished(Some(RpcStatus { code: 14-UNAVAILABLE, message: "keepalive watchdog timeout", details: })))”] [store_id=5]
[2025/05/21 05:49:57.298 +08:00] [ERROR] [raft_client.rs:883] [“connection abort”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:50:14.886 +08:00] [ERROR] [snap.rs:546] [“failed to send snap”] [err=“Grpc(RpcFailure(RpcStatus { code: 14-UNAVAILABLE, message: "keepalive watchdog timeout", details: }))”] [region_id=46768] [to_addr=172.29.156.16:20161]
[2025/05/21 05:50:24.376 +08:00] [ERROR] [raft_client.rs:581] [“connection aborted”] [addr=172.29.156.16:20161] [receiver_err=“Some(RpcFailure(RpcStatus { code: 14-UNAVAILABLE, message: "keepalive watchdog timeout", details: }))”] [sink_error=“Some(RpcFinished(Some(RpcStatus { code: 14-UNAVAILABLE, message: "keepalive watchdog timeout", details: })))”] [store_id=5]
[2025/05/21 05:50:24.376 +08:00] [ERROR] [raft_client.rs:883] [“connection abort”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:50:29.377 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:50:29.995 +08:00] [ERROR] [snap.rs:546] [“failed to send snap”] [err=“Grpc(RpcFinished(Some(RpcStatus { code: 14-UNAVAILABLE, message: "Connection timed out", details: })))”] [region_id=46768] [to_addr=172.29.156.16:20161]
[2025/05/21 05:50:30.514 +08:00] [ERROR] [kv.rs:753] [“KvService::batch_raft send response fail”] [err=RemoteStopped]
[2025/05/21 05:50:34.382 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:12.644 +08:00] [ERROR] [raft_client.rs:581] [“connection aborted”] [addr=172.29.156.16:20161] [receiver_err=“Some(RpcFailure(RpcStatus { code: 14-UNAVAILABLE, message: "keepalive watchdog timeout", details: }))”] [sink_error=“Some(RpcFinished(Some(RpcStatus { code: 14-UNAVAILABLE, message: "keepalive watchdog timeout", details: })))”] [store_id=5]
[2025/05/21 05:51:12.644 +08:00] [ERROR] [raft_client.rs:883] [“connection abort”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:17.646 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:22.650 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:27.655 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:29.563 +08:00] [ERROR] [kv.rs:753] [“KvService::batch_raft send response fail”] [err=RemoteStopped]
[2025/05/21 05:51:32.661 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:37.667 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:42.673 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:47.679 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:52.686 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:51:57.692 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:52:02.697 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
[2025/05/21 05:52:07.703 +08:00] [ERROR] [raft_client.rs:851] [“wait connect timeout”] [addr=172.29.156.16:20161] [store_id=5]
【其他附件:截图/日志/监控】