使用pd-recover 恢复pd,遇到Etcd cluster ID mismatch

【TiDB 使用环境】生产环境 /测试/ Poc
【TiDB 版本】v6.5.5.
【操作系统】redhat 7.9
【部署方式】云上部署(什么云)/机器部署(什么机器配置、什么硬盘)
【集群数据量】
【集群节点数】
【问题复现路径】使用pd-recover 恢复pd,遇到Etcd cluster ID mismatch

##2.71
[2025/09/16 09:49:30.595 +08:00] [INFO] [stream.go:166] [“started stream writer with remote peer”] [local-member-id=5b383763007fc455] [remote-peer-id=42843e713797aa2a]
[2025/09/16 09:49:30.595 +08:00] [INFO] [stream.go:166] [“started stream writer with remote peer”] [local-member-id=5b383763007fc455] [remote-peer-id=42843e713797aa2a]
[2025/09/16 09:49:30.596 +08:00] [INFO] [stream.go:406] [“started stream reader with remote peer”] [stream-reader-type=“stream MsgApp v2”] [local-member-id=5b383763007fc455] [remote-peer-id=42843e713797aa2a]
[2025/09/16 09:49:30.596 +08:00] [INFO] [stream.go:406] [“started stream reader with remote peer”] [stream-reader-type=“stream Message”] [local-member-id=5b383763007fc455] [remote-peer-id=42843e713797aa2a]
[2025/09/16 09:49:30.598 +08:00] [INFO] [server.go:704] [“starting initial election tick advance”] [election-ticks=6]
[2025/09/16 09:49:30.601 +08:00] [INFO] [etcd.go:585] [“serving peer traffic”] [address=“[::]:2380”]
[2025/09/16 09:49:30.601 +08:00] [INFO] [etcd.go:247] [“now serving peer/client/metrics”] [local-member-id=5b383763007fc455] [initial-advertise-peer-urls=“[http://192.168.2.71:2380]”] [listen-peer-urls=“[http://0.0.0.0:2380]”] [advertise-client-urls=“[http://192.168.2.71:2379]”] [listen-client-urls=“[http://0.0.0.0:2379]”] [listen-metrics-urls=“”]
[2025/09/16 09:49:30.601 +08:00] [WARN] [stream.go:682] [“request sent was ignored by remote peer due to cluster ID mismatch”] [remote-peer-id=42843e713797aa2a] [remote-peer-cluster-id=60f3404a4674d551] [local-member-id=5b383763007fc455] [local-member-cluster-id=9e63bd2785c31a28] [error=“cluster ID mismatch”]
[2025/09/16 09:49:30.601 +08:00] [WARN] [stream.go:682] [“request sent was ignored by remote peer due to cluster ID mismatch”] [remote-peer-id=42843e713797aa2a] [remote-peer-cluster-id=60f3404a4674d551] [local-member-id=5b383763007fc455] [local-member-cluster-id=9e63bd2785c31a28] [error=“cluster ID mismatch”]
[2025/09/16 09:49:30.602 +08:00] [WARN] [stream.go:682] [“request sent was ignored by remote peer due to cluster ID mismatch”] [remote-peer-id=36bda1842c68a23a] [remote-peer-cluster-id=2e5217ae4e76876a] [local-member-id=5b383763007fc455] [local-member-cluster-id=9e63bd2785c31a28] [error=“cluster ID mismatch”]
[2025/09/16 09:49:30.602 +08:00] [WARN] [stream.go:682] [“request sent was ignored by remote peer due to cluster ID mismatch”] [remote-peer-id=36bda1842c68a23a] [remote-peer-cluster-id=2e5217ae4e76876a] [local-member-id=5b383763007fc455] [local-member-cluster-id=9e63bd2785c31a28] [error=“cluster ID mismatch”]
[2025/09/16 09:49:30.605 +08:00] [FATAL] [main.go:120] [“run server failed”] [error=“Etcd cluster ID mismatch, expect 11413173858132498984, got 3337756311243097962”] [stack=“main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:120\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

##2.72

[2025/09/16 09:52:59.792 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=36bda1842c68a23a] [local-member-cluster-id=2e5217ae4e76876a] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:52:59.792 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=36bda1842c68a23a] [local-member-cluster-id=2e5217ae4e76876a] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:53:15.795 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=36bda1842c68a23a] [local-member-cluster-id=2e5217ae4e76876a] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:53:15.795 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=36bda1842c68a23a] [local-member-cluster-id=2e5217ae4e76876a] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:53:15.802 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=36bda1842c68a23a] [local-member-cluster-id=2e5217ae4e76876a] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:53:15.803 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=36bda1842c68a23a] [local-member-cluster-id=2e5217ae4e76876a] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]

##2.73

[2025/09/16 09:56:10.901 +08:00] [WARN] [balance_region.go:226] [“region have no leader”] [scheduler=balance-region-scheduler] [region-id=323025]
[2025/09/16 09:56:12.304 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=42843e713797aa2a] [local-member-cluster-id=60f3404a4674d551] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:56:12.304 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=42843e713797aa2a] [local-member-cluster-id=60f3404a4674d551] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:56:12.307 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=42843e713797aa2a] [local-member-cluster-id=60f3404a4674d551] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:56:12.307 +08:00] [WARN] [http.go:547] [“request cluster ID mismatch”] [local-member-id=42843e713797aa2a] [local-member-cluster-id=60f3404a4674d551] [local-member-server-version=3.4.21] [local-member-server-minimum-cluster-version=3.0.0] [remote-peer-server-name=5b383763007fc455] [remote-peer-server-version=3.4.21] [remote-peer-server-minimum-cluster-version=3.0.0] [remote-peer-cluster-id=9e63bd2785c31a28]
[2025/09/16 09:56:15.901 +08:00] [WARN] [balance_region.go:226] [“region have no leader”] [scheduler=balance-region-scheduler] [region-id=197029]
[2025/09/16 09:56:15.901 +08:00] [WARN] [balance_region.go:226] [“region have no leader”] [scheduler=balance-region-scheduler] [region-id=323017]
[2025/09/16 09:56:15.901 +08:00] [WARN] [balance_region.go:226] [“region have no leader”] [scheduler=balance-region-scheduler] [region-id=323037]
[2025/09/16 09:56:15.901 +08:00] [WARN] [balance_region.go:226] [“region have no leader”] [scheduler=balance-region-scheduler] [region-id=323097]
[2025/09/16 09:56:15.901 +08:00] [WARN] [balance_region.go:226] [“region have no leader”] [scheduler=balance-region-scheduler] [region-id=197029]

【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【复制黏贴 ERROR 报错的日志】
【其他附件:截图/日志/监控】


pd的日志部分是,使用pd-reocver恢复后的日志,感觉是cluster id不一致了,应当如何操作呢。

截图是就是当日恢复2.72和2.73的pd的截图。

##2.71

[tidb@tidb01 mhxy]$ cat /u01/tidb-deploy/pd-2379/log/pd.log | grep “cluster id”
[2024/02/05 20:05:58.317 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2024/02/06 08:53:26.330 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2024/02/07 11:47:18.625 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2024/02/12 13:12:30.348 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]

[2025/09/15 10:17:41.938 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/15 10:50:54.916 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/15 15:01:05.230 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/15 16:40:08.318 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/15 16:54:21.906 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[tidb@tidb01 mhxy]$

[tidb@tidb01 mhxy]$ grep “idAllocator allocates a new id” /u01/tidb-deploy/pd-2379/log/pd*.log | awk -F’=’ ‘{print $2}’ | awk -F’]’ ‘{print $1}’ | sort -r -n | head -n 1
326000
[tidb@tidb01 mhxy]$

##2.72

[root@tidb02 ~]# cat /u01/tidb-deploy/pd-2379/log/pd.log | grep “cluster id”
[2025/09/15 16:06:08.951 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7550224237002577996]
[2025/09/15 16:27:38.248 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/15 16:33:35.030 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/15 16:40:11.406 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/15 17:10:09.691 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/16 09:04:23.963 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/16 09:34:48.400 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[root@tidb02 ~]#

alloc-id
[root@tidb02 ~]# grep “idAllocator allocates a new id” /u01/tidb-deploy/pd-2379/log/pd*.log | awk -F’=’ ‘{print $2}’ | awk -F’]’ ‘{print $1}’ | sort -r -n | head -n 1
108000
[root@tidb02 ~]#
[root@tidb02 ~]#

##2.73

[root@tidb03 ~]# cat /u01/tidb-deploy/pd-2379/log/pd.log | grep “cluster id”
[2025/09/15 16:58:45.263 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7550237795537912436]
[2025/09/15 17:10:07.640 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/16 09:04:22.162 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[2025/09/16 09:34:50.195 +08:00] [INFO] [server.go:399] [“init cluster id”] [cluster-id=7332087958836394704]
[root@tidb03 ~]#
[root@tidb03 ~]# grep “idAllocator allocates a new id” /u01/tidb-deploy/pd-2379/log/pd*.log | awk -F’=’ ‘{print $2}’ | awk -F’]’ ‘{print $1}’ | sort -r -n | head -n 1
330000
[root@tidb03 ~]#

3楼,是现在检查cluster-id 和alloc_id , cluster-id 在3个pd日志里是一样的。

一开始的cluster-id获取错了吧?从哪里获取的?

最新进展 ,这是pd 脑裂了嘛

都是从pd.log里面获取的

你看看tidb/tikv日志里的cluster-id是不是跟这个一样

tikv 还是有问题的

这里怎么会有两个不同的cluster-id

下线这个pd。然后扩容一个pd。这样简单一些。

tiup cluster scale-in mhxy --node 192.168.2.72:2379 --force
tiup cluster scale-in mhxy --node 192.168.2.73:2379 --force

tiup cluster scale-out mhxy ./scaleout-pd.yaml

重新缩容/扩容 pd 解决了。