【 TiDB 使用环境】生产环境
【 TiDB 版本】
【复现路径】scale-in一个pd节点成功后,群集还会重连这个pd节点。
【遇到的问题:问题现象及影响】如果这个下线的节点,scale-out到新的pd群集,两个群集将会合并导致数据混乱。
10.25.248.131:2380(VMS584328)之前属于tikv-oversea群集,2024/04/08 10:19:26对10.25.248.131:2380进行了scale-in,tiup cluster display tikv-oversea已经显示10.25.248.131:2380被移出,随后下线了服务器VMS584328。但是pd.log中显示tikv-oversea还在连接10.25.248.131:2380并报错连不上,报错一直持续到了2024/04/10。
2024/04/10新上线了一台服务器VMS602679恰好ip复用了10.25.248.131,2024/04/10 13:47将10.25.248.131:2380(VMS602679)scale-out到了tikv-dal-test群集,tikv-dal-test群集变成了3+1模式。tikv-oversea的6节点也在这时候重新连接上了10.25.248.131:2380,变成了6+1模式。随后3+1+6,10个pd节点全部打通,形成了一个10节点的pd群集,此时数据混乱。
tikv-oversea
10.109.220.10:2379
10.109.220.9:2379
10.25.248.208:2379
10.25.248.246:2379
10.58.228.76:2379
10.58.228.86:2379
tikv-dal-test
10.58.228.37
10.109.216.124
10.25.248.212
tikv-oversea pd log:
[2024/04/07 18:37:25.977 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=7->8] [last-endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.9:2379,http://10.109.220.10:2379,http://10.25.248.246:2379,http://10.25.248.131:2379,http://10.25.249.164:2379]”] [endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.25.248.246:2379,http://10.109.220.9:2379,http://10.25.248.131:2379,http://10.25.249.164:2379,http://10.25.248.208:2379]”]
[2024/04/08 10:19:26.254 +08:00] [INFO] [cluster.go:422] [“removed member”] [cluster-id=468758231b5b0393] [local-member-id=edff54aa33575887] [removed-remote-peer-id=f67c161a4e9b9cb8] [removed-remote-peer-urls=“[http://10.25.248.131:2380]”]
[2024/04/08 10:19:27.958 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/08 10:19:27.958 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
…
[2024/04/09 14:46:33.395 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection timed out". Reconnecting…”]
[2024/04/09 14:49:25.265 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: i/o timeout". Reconnecting…”]
…
[2024/04/10 13:44:05.323 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/10 13:45:57.545 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/10 13:46:21.890 +08:00] [WARN] [grpclog.go:60] [“grpc: addrConn.createTransport failed to connect to {http://10.25.248.131:2379 0 }. Err :connection error: desc = "transport: Error while dialing dial tcp 10.25.248.131:2379: connect: connection refused". Reconnecting…”]
[2024/04/10 13:47:58.088 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=6->7] [last-endpoints=“[http://10.25.248.246:2379,http://10.58.228.76:2379,http://10.25.248.208:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.109.220.9:2379]”] [endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.109.220.9:2379,http://10.25.248.208:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”]
[2024/04/10 13:48:08.085 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=6->7] [last-endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.9:2379,http://10.109.220.10:2379,http://10.25.248.246:2379,http://10.25.248.208:2379]”] [endpoints=“[http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.25.248.208:2379,http://10.58.228.76:2379,http://10.109.220.9:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”]
[2024/04/10 13:48:18.090 +08:00] [INFO] [etcdutil.go:309] [“update endpoints”] [num-change=7->10] [last-endpoints=“[http://10.58.228.76:2379,http://10.58.228.86:2379,http://10.109.220.10:2379,http://10.109.220.9:2379,http://10.25.248.208:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”] [endpoints=“[http://10.109.220.10:2379,http://10.58.228.76:2379,http://10.109.220.9:2379,http://10.58.228.86:2379,http://10.109.216.124:2379,http://10.25.248.212:2379,http://10.25.248.208:2379,http://10.58.228.37:2379,http://10.25.248.246:2379,http://10.25.248.131:2379]”]