tikv online restore

tidb v6.1 中的unsafe-recover remove-fail-stores 已经支持在线执行了么?

试了一下发现还是会有报错
tikv-ctl --data-dir /DMBITXVC/tidb/tidb-data/tikv-20160/ unsafe-recover remove-fail-stores -s 2154,2155 --all-regions
[2022/07/19 10:41:31.546 +09:00] [WARN] [config.rs:604] [“compaction guard is disabled due to region info provider not available”]
[2022/07/19 10:41:31.546 +09:00] [WARN] [config.rs:712] [“compaction guard is disabled due to region info provider not available”]
[2022/07/19 10:41:31.549 +09:00] [ERROR] [executor.rs:1092] [“error while open kvdb: Storage Engine IO error: While lock file: /DMBITXVC/tidb/tidb-data/tikv-20160/db/LOCK: Resource temporarily unavailable”]
[2022/07/19 10:41:31.549 +09:00] [ERROR] [executor.rs:1095] [“LOCK file conflict indicates TiKV process is running. Do NOT delete the LOCK file and force the command to run. Doing so could cause data corruption.”]

环境:
5节点tikv 删除了目标region的两个tikv节点的数据
/tidb/tidb-data/tikv-20160$ rm -rf ./*
集群状态变为
10.137.32.3:3000 grafana 10.137.32.3 3000 linux/x86_64 Up - /opt/tidb/tidb-deploy/grafana-3000
10.137.32.3:2379 pd 10.137.32.3 2379/2380 linux/x86_64 Up|L|UI /opt/tidb/tidb-data/pd-2379 /opt/tidb/tidb-deploy/pd-2379
10.137.32.3:9090 prometheus 10.137.32.3 9090/12020 linux/x86_64 Up /opt/tidb/tidb-data/prometheus-9090 /opt/tidb/tidb-deploy/prometheus-9090
10.137.32.3:4000 tidb 10.137.32.3 4000/10080 linux/x86_64 Up - /opt/tidb/tidb-deploy/tidb-4000
10.137.32.3:20160 tikv 10.137.32.3 20160/20180 linux/x86_64 Up /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.4:20160 tikv 10.137.32.4 20160/20180 linux/x86_64 Disconnected /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.4:20161 tikv 10.137.32.4 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
10.137.32.5:20160 tikv 10.137.32.5 20160/20180 linux/x86_64 Disconnected /opt/tidb/tidb-data/tikv-20160 /opt/tidb/tidb-deploy/tikv-20160
10.137.32.5:20161 tikv 10.137.32.5 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161

此时在还留有一个副本的几点执行unsafe-recover remove-fail-stores 报错,但是将该节点停掉后执行正常。是操作有什么不对的地方么?

tikv进程还在吧,应该是需要tikv进程停止之后执行。

1 个赞

online是指不需要停TiKV进程?

1 个赞

这块怪我了,又看了一下是要使用pdctl 去删除,是不需要停止tikv的
pd-ctl -u http://10.137.32.3:2379 unsafe remove-failed-stores 3138,3137
Starting component ctl: /home/tidb/.tiup/components/ctl/v6.1.0/ctl pd -u http://10.137.32.3:2379 unsafe remove-failed-stores 3138,3137
Success!

给大家添麻烦了

2 个赞

清除信息就可以了

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。