【 TiDB 使用环境】生产环境
【 TiDB 版本】4.0.2
【复现路径】线上集群3个pd节点组成一个pd集群,其中一个pd节点磁盘故障,数据丢失。想通过新增一个pd节点,然后在集群中把故障的节点踢掉的方式,维持pd集群3节点不变。新增的pd节点,加入pd集群时候报错,日志如下:
[WARN] [server.go:1617] [“rejecting member add request; local member has not been connected to all peers, reconfigure breaks active quorum”] [local-member-id=ef7d3edecdf7ded7] [requested-member-add=“{ID:1b567dd9bd29b638 RaftAttributes:{PeerURLs:[http://xxx:xxx] IsLearner:false} Attributes:{Name: ClientURLs:}}”] [error=“etcdserver: unhealthy cluster”]
【遇到的问题】
【资源配置】
【附件:截图/日志/监控】
» member
{
“header”: {
“cluster_id”:
},
“members”: [
{
“name”: “pd1”,
“member_id”: ,
“peer_urls”: [
"http://
],
“client_urls”: [
“http://”
],
“deploy_path”: “/usr/local/tidb/bin”,
“binary_version”: “v4.0.2”,
“git_hash”: “0bd6bbd53600b4770e7fa0707d131f8a71be90e4”
},
{
“name”: “pd3”,
“member_id”: ,
“peer_urls”: [
“http://”
],
“client_urls”: [
“http://”
],
“deploy_path”: “/usr/local/tidb/bin”,
“binary_version”: “v4.0.2”,
“git_hash”: “0bd6bbd53600b4770e7fa0707d131f8a71be90e4”
},
{
“name”: “pd2”,
“member_id”: ,
“peer_urls”: [
“http://”
],
“client_urls”: [
“http://”
],
“deploy_path”: “/usr/local/tidb/bin”,
“binary_version”: “v4.0.2”,
“git_hash”: “0bd6bbd53600b4770e7fa0707d131f8a71be90e4”
}
],
“leader”: {
“name”: “pd3”,
“member_id”: ,
“peer_urls”: [
“http://”
],
“client_urls”: [
“http://”
]
},
“etcd_leader”: {
“name”: “pd3”,
“member_id”: ,
“peer_urls”: [
“http://”
],
“client_urls”: [
“http://”
],
“deploy_path”: “/usr/local/tidb/bin”,
“binary_version”: “v4.0.2”,
“git_hash”: “0bd6bbd53600b4770e7fa0707d131f8a71be90e4”
}
}
有故障的是PD3吗?还是另外两个不是leader的
有故障的是pd1,pd2和pd3是正常节点,我想先增加pd4,然后再把pd1踢掉。上面报错是加pd4的时候报错。
我的操作步骤有误:
正确步骤:应该先踢掉pd1,然后再把pd4加进集群。
先把PD1剔除,集群正常,然后在扩容PD4加进去。就正常了
因为你扩容PD4的时候当前集群有一个故障节点。
1 个赞
感谢指正
此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。