tidb 5.0.0 tikv 节点缩容后,启动报错

原来有3个tikv 节点, 缩容其中一个 一直处于 pending offline 中间状态持续一周, 后面进行强制缩容; 强制缩容后,启动就一直起不来,尝试过修改reserve-space 这个参数,还是起不来,报如下错:
[2021/06/09 11:14:21.585 +08:00] [ERROR] [server.rs:862] [“failed to init io snooper”] [err_code=KV:Unknown] [err="“IO snooper is n
ot started due to not compiling with BCC”"]
[2021/06/09 11:14:21.585 +08:00] [INFO] [mod.rs:116] [“encryption: none of key dictionary and file dictionary are found.”]
[2021/06/09 11:14:21.585 +08:00] [INFO] [mod.rs:477] [“encryption is disabled.”]
[2021/06/09 11:14:21.629 +08:00] [INFO] [future.rs:146] [“starting working thread”] [worker=gc-worker]
[2021/06/09 11:14:21.688 +08:00] [INFO] [mod.rs:214] [“Storage started.”]
[2021/06/09 11:14:21.690 +08:00] [INFO] [node.rs:176] [“put store to PD”] [store=“id: 105001 address: “192.168.5.46:20160” version
: “5.0.0” status_address: “192.168.5.46:20180” git_hash: “7706b9634bd901c9fe8dbe6a556025abbfd0793d” start_timestamp: 162320846
1 deploy_path: “/data/tidb-deploy/tikv-20160/bin””]
[2021/06/09 11:14:21.692 +08:00] [ERROR] [util.rs:433] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { st
atus: 2-UNKNOWN, details: Some(“duplicated store address: id:105001 address:\“192.168.5.46:20160\” version:\“5.0.0\” status
_address:\“192.168.5.46:20180\” git_hash:\“7706b9634bd901c9fe8dbe6a556025abbfd0793d\” start_timestamp:1623208461 deploy_path
:\”/data/tidb-deploy/tikv-20160/bin\” , already registered by id:4 address:\“192.168.5.46:20160\” state:Offline version:\"
5.0.0\" status_address:\“192.168.5.46:20180\” git_hash:\“7706b9634bd901c9fe8dbe6a556025abbfd0793d\” start_timestamp:162270
9652 deploy_path:\"/data/tidb-deploy/tikv-20160/bin\" last_heartbeat:1621448738596752984 “) }))”]
[2021/06/09 11:14:21.692 +08:00] [INFO] [util.rs:512] [“connecting to PD endpoint”] [endpoints=http://192.168.5.43:2379]
[2021/06/09 11:14:21.693 +08:00] [INFO] [] [“New connected subchannel at 0x7efd5361d790 for subchannel 0x7efd59018d40”]
[2021/06/09 11:14:21.694 +08:00] [INFO] [util.rs:512] [“connecting to PD endpoint”] [endpoints=http://192.168.5.44:2379]
[2021/06/09 11:14:21.695 +08:00] [INFO] [util.rs:627] [“connected to PD member”] [endpoints=http://192.168.5.44:2379]
[2021/06/09 11:14:21.695 +08:00] [INFO] [util.rs:150] [“heartbeat sender and receiver are stale, refreshing …”]
[2021/06/09 11:14:21.695 +08:00] [INFO] [util.rs:181] [“update pd client”] [forworded_host=] [prev_forwarded_host=]
[2021/06/09 11:14:21.695 +08:00] [INFO] [util.rs:313] [“tring to update PD client done”] [spend=2.983463ms]
[2021/06/09 11:14:21.697 +08:00] [ERROR] [util.rs:433] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { st
atus: 2-UNKNOWN, details: Some(“duplicated store address: id:105001 address:\“192.168.5.46:20160\” version:\“5.0.0\” status
_address:\“192.168.5.46:20180\” git_hash:\“7706b9634bd901c9fe8dbe6a556025abbfd0793d\” start_timestamp:1623208461 deploy_path
:\”/data/tidb-deploy/tikv-20160/bin\” , already registered by id:4 address:\“192.168.5.46:20160\” state:Offline version:\"
5.0.0\" status_address:\“192.168.5.46:20180\” git_hash:\“7706b9634bd901c9fe8dbe6a556025abbfd0793d\” start_timestamp:162270
9652 deploy_path:\"/data/tidb-deploy/tikv-20160/bin\" last_heartbeat:1621448738596752984 “) }))”]
[2021/06/09 11:14:21.697 +08:00] [ERROR] [util.rs:442] [“reconnect failed”] [err_code=KV:PD:Unknown] [err=“Other(”[components/pd_cl
ient/src/util.rs:256]: cancel reconnection due to too small interval")"]
[2021/06/09 11:14:22.700 +08:00] [ERROR] [util.rs:433] [“request failed”] [err_code=KV:PD:gRPC] [err=“Grpc(RpcFailure(RpcStatus { st
atus: 2-UNKNOWN, details: Some(“duplicated store address: id:105001 address:\“192.168.5.46:20160\” version:\“5.0.0\” status
_address:\“192.168.5.46:20180\” git_hash:\“7706b9634bd901c9fe8dbe6a556025abbfd0793d\” start_timestamp:1623208461 deploy_path
:\”/data/tidb-deploy/tikv-20160/bin\” , already registered by id:4 address:\“192.168.5.46:20160\” state:Offline version:\"
5.0.0\" status_address:\“192.168.5.46:20180\” git_hash:\“7706b9634bd901c9fe8dbe6a556025abbfd0793d\” start_timestamp:162270
9652 deploy_path:\"/data/tidb-deploy/tikv-20160/bin\" last_heartbeat:1621448738596752984 “) }))”]

  • 三副本三个节点不支持缩容到两个

  • 不建议强制缩容,在未了解 region 副本状态和分布的情况下,可能造成数据丢失

  • 启动报错是因为日志中提示存在重复的 store,也就是之前操作没缩容干净的 duplicated store address: id:105001 address:\“192.168.5.46:20160\” version:\“5.0.0

  • 简单的可以换一个端口和 deploy 目录新扩容一个节点,而不是原地启动;或者你检查之前的目录是否清理干净以及 pd-ctl 看到的 store 信息是否清理干净,等确认没问题后再扩容一个节点

tiup ctl:v5.0.0 pd -u http://192.168.5.43:2379 store
tiup ctl:v5.0.0 pd -u http://192.168.5.43:2379 store delete 4

curl -X POST ‘http://192.168.5.43:2379/pd/api/v1/store/4/state?state=Offline

192.168.5.46 这个节点store id 是4 用上面这些语句都清理不掉 ,执行显示成功, 信息还是在里面

前面提到了三个副本三个节点不支持缩容,因为默认每个 store 实例最多存放一副本,缩容后不满足三副本的约束。

目前的三个节点还是先把 Offline 的状态改回去吧
curl -X POST http://$ {pd_ip}:2379/pd/api/v1/store/${store_id}/state?state=Up

清理数据的时候,不小心把 alertmanager 的/data/tidb-deploy/alertmanager-9093 ; grafana 的 /data/tidb-deploy/grafana-3000 和 prometheus 的 /data/tidb-deploy/prometheus-9090 目录都删掉了, 这3个组件现在都启动不了了, 有没有什么办法 恢复 或者 快速重装 ? 谢谢

可以试试 scale-in -N 把这几个服务缩容掉,然后 scale-out 重新扩容

那几个服务缩容后,扩容都正常了; 我按照你的建议 换端口和 deploy 目录扩容 46节点,提示要设置label 大体如下 :
192.168.5.46:20160:
multiple TiKV instances are deployed at the same host but location label missing
我百度,按照其他人的经验设置配置文件,但并不生效 ,我手动设置pd 的 label 报错如下:

tiup ctl:v5.0.0 pd -u 192.168.5.44:2379 config set location-labels host

Starting component ctl: /root/.tiup/components/ctl/v5.0.0/ctl pd -u 192.168.5.44:2379 config set location-labels host
Failed to set config: [400] “cannot to update replication config, the default rules do not consistent with replication config, please update rule instead”

请问这是什么原因? 怎么处理,谢谢

报错反馈配置的文件和预期规则不匹配,建议 review 一下 location labels 配置规则。