tidb重启失败

【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】tidb 6.5.2
【复现路径】做过哪些操作出现的问题
tikv节点挂掉,重启失败 报错与PD无法通信
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】

[2024/04/07 13:11:53.334 +08:00] [ERROR] [pd.rs:663] [“failed to send split infos to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:53.359 +08:00] [ERROR] [pd.rs:690] [“failed to send min resolved ts to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:54.041 +08:00] [INFO] [util.rs:260] [“update pd client”] [via=] [leader=http://10.10.40.24:2379] [prev_via=] [prev_leader=http://10.
10.40.24:2379]
[2024/04/07 13:11:54.041 +08:00] [WARN] [util.rs:268] [“PD client refresh region heartbeat”] [takes=2498]
[2024/04/07 13:11:54.041 +08:00] [INFO] [util.rs:394] [“trying to update PD client done”] [spend=783.247373497s]
[2024/04/07 13:11:54.041 +08:00] [INFO] [util.rs:763] [“connected to PD member”] [endpoints=http://10.10.40.24:2379]
[2024/04/07 13:11:54.041 +08:00] [INFO] [util.rs:220] [“heartbeat sender and receiver are stale, refreshing …”]
[2024/04/07 13:11:54.041 +08:00] [INFO] [util.rs:233] [“buckets sender and receiver are stale, refreshing …”]
[2024/04/07 13:11:54.044 +08:00] [ERROR] [client.rs:652] [“failed to send heartbeat”] [err_code=KV:Pd:Grpc] [err=“Grpc(RpcFinished(Some(RpcStatus { co
de: 1-CANCELLED, message: "CANCELLED", details: })))”]
[2024/04/07 13:11:54.085 +08:00] [INFO] [tso.rs:162] [“TSO worker terminated”] [receiver_cause=None] [sender_cause=None]
[2024/04/07 13:11:54.356 +08:00] [ERROR] [pd.rs:663] [“failed to send split infos to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:54.378 +08:00] [ERROR] [pd.rs:690] [“failed to send min resolved ts to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:55.397 +08:00] [ERROR] [pd.rs:663] [“failed to send split infos to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:55.422 +08:00] [ERROR] [pd.rs:690] [“failed to send min resolved ts to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:55.730 +08:00] [INFO] [] [“subchannel 0x7f035aa4b400 {address=ipv4:10.0.61.198:20161, args=grpc.client_channel_factory=0x7f
036469c270, grpc.default_authority=10.0.61.198:20161, grpc.default_compression_algorithm=0, grpc.gprc_min_message_size_to_compress=4096, grpc.gzip_com
pression_level=2, grpc.http2.lookahead_bytes=2097152, grpc.initial_reconnect_backoff_ms=1000, grpc.internal.subchannel_pool=0x7f0364638950, grpc.keepa
live_time_ms=10000, grpc.keepalive_timeout_ms=3000, grpc.max_reconnect_backoff_ms=5000, grpc.primary_user_agent=grpc-rust/0.10.4, grpc.resource_quota=
0x7f03646bf110, grpc.server_uri=dns:///10.0.61.198:20161, random id=595}: failed to connect to channel, retrying”]
[2024/04/07 13:11:56.432 +08:00] [ERROR] [pd.rs:663] [“failed to send split infos to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:56.459 +08:00] [ERROR] [pd.rs:690] [“failed to send min resolved ts to pd worker”] [err=“channel has been closed”]
[2024/04/07 13:11:56.489 +08:00] [INFO] [util.rs:260] [“update pd client”] [via=] [leader=http://10.10.40.24:2379] [prev_via=] [prev_leader=http://10.
10.40.24:2379]

网络通么,看这到pd不通

你先tiup cluster display 下集群看看各个组件状态

几个 tikv ?

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
看下这个

网络是通的

访问不到pd,有点奇怪 。要不手动访问一下 [http://10.40.24:2379]

集群拓扑发出来看下

好像是访问不到pd server

这个访问不了呀,检查下网络吧。

看起来好像是访问不到pd server

看起来是心跳不通

tiup cluster display XXX
tikv server: telnet 10.10.40.24 2379
还有firewall、iptables都检查一下

能够访问到 telnet 10.10.40.24 2379 是可以的
image

机器重启下 问题就解决了

缩容,再扩容

PD异常了

启用法宝模式。重启服务器! :smile:

请问,问题解决了吗?是什么原因呢

重启解决好多问题