断电后有一台pd节点起不来

【 TiDB 使用环境】测试
【 TiDB 版本】v5.4.3
【复现路径】做过哪些操作出现的问题
企业园区停电3小时,恢复通电后3台服务器2台自动启动,其中一台隔了几天才启动。
【遇到的问题:问题现象及影响】
一台pd节点处于Down状态一直起不来。是不是 PD 的数据都丢失了?具体要如何重建恢复?
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
[2024/11/05 14:21:33.501 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=b731fe7fcba5038b] [local-member-attributes=“{Name:pd-10.0.0.40-2379 ClientURLs:[http://10.0.0.40:2379]}”] [request-path=/0/members/b731fe7fcba5038b/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]
[2024/11/05 14:21:44.501 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=b731fe7fcba5038b] [local-member-attributes=“{Name:pd-10.0.0.40-2379 ClientURLs:[http://10.0.0.40:2379]}”] [request-path=/0/members/b731fe7fcba5038b/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]
[2024/11/05 14:21:55.502 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=b731fe7fcba5038b] [local-member-attributes=“{Name:pd-10.0.0.40-2379 ClientURLs:[http://10.0.0.40:2379]}”] [request-path=/0/members/b731fe7fcba5038b/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]
[2024/11/05 14:22:06.502 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=b731fe7fcba5038b] [local-member-attributes=“{Name:pd-10.0.0.40-2379 ClientURLs:[http://10.0.0.40:2379]}”] [request-path=/0/members/b731fe7fcba5038b/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]
[2024/11/05 14:22:17.503 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=b731fe7fcba5038b] [local-member-attributes=“{Name:pd-10.0.0.40-2379 ClientURLs:[http://10.0.0.40:2379]}”] [request-path=/0/members/b731fe7fcba5038b/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]

非要处理的话,强制扩缩容就好了,但是正常不会加不进去的,应该还有别的报错吧

如果一共3个pd节点,只是1个故障,应该不影响使用的,直接缩容重新扩容不行吗?

1 个赞

[2024/10/03 22:18:52.278 +08:00] [WARN] [retry_interceptor.go:61] [“retrying of unary invoker failed”] [target=endpoint://client-83c8d0d0-3534-43ba-9bbb-9a3ccd80342b/10.0.0.40:2379] [attempt=0] [error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”]
[2024/10/03 22:19:03.784 +08:00] [WARN] [retry_interceptor.go:61] [“retrying of unary invoker failed”] [target=endpoint://client-83c8d0d0-3534-43ba-9bbb-9a3ccd80342b/10.0.0.40:2379] [attempt=0] [error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”]
[2024/10/03 22:19:09.868 +08:00] [WARN] [retry_interceptor.go:61] [“retrying of unary invoker failed”] [target=endpoint://client-83c8d0d0-3534-43ba-9bbb-9a3ccd80342b/10.0.0.40:2379] [attempt=0] [error=“rpc error: code = Canceled desc = context canceled”]
好像没其他,pd错误日志停留在那一刻

是3个pd节点,不影响使用。还一直期望着有一天它自己能起来。

https://docs.pingcap.com/zh/tidb/stable/pd-recover#方式二完全重建-pd-集群

可以用这种方式吗?

没必要啊,你只坏了一个节点啊,这个是真个集群坏了才用的。

1 个赞

通过扩缩容方式处理好一些

1 个赞

不影响吧,直接强制扩缩容吧

1 个赞

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。