升级服务器的操作系统,重启了机器
然后tiflash有3个节点起不来了,一直报错循环重启,报错截图没留,大概就是这3个节点互相没法通信
然后就让运维同学用pd-ctl强制缩容了,因为正常缩容一直是pending状态,还是循环报错起不来
然后扩容时报错说这3个store还存在:
[2023/12/23 01:53:34.636 +08:00] [FATAL] [run.rs:1267] ["failed to start node: Other(\"[components/pd_client/src/util.rs:885]: duplicated store address: id:445645339 address:\\\"tidb01:3930\\\" labels:<key:\\\"engine\\\" value:\\\"tiflash\\\" > version:\\\"v7.1.2\\\" peer_address:\\\"tidb01:20170\\\" status_address:\\\"tidb01:20292\\\" git_hash:\\\"1b60452040258606e96b830b040aabf54625a8f3\\\" start_timestamp:1703267614 deploy_path:\\\"/tidb-deploy/tiflash-9000/bin/tiflash\\\" , already registered by id:7512016 address:\\\"tidb01:3930\\\" state:Offline labels:<key:\\\"engine\\\" value:\\\"tiflash\\\" > labels:<key:\\\"host\\\" value:\\\"data01\\\" > labels:<key:\\\"region\\\" value:\\\"mctech\\\" > labels:<key:\\\"zone\\\" value:\\\"tc-beijing\\\" > version:\\\"v7.1.2\\\" peer_address:\\\"tidb01:20170\\\" status_address:\\\"tidb01:20292\\\" git_hash:\\\"1b60452040258606e96b830b040aabf54625a8f3\\\" start_timestamp:1703267460 deploy_path:\\\"/tidb-deploy/tiflash-9000/bin/tiflash\\\" last_heartbeat:1703264497055671655 node_state:Removing \")"]
于是用pd-ctl删除了这3个节点,再扩容,就起来了
现在集群一切正常,就是留了这3个墓碑一直清理不掉了