tikv故障修复过程中,故障节点启动,查询副本状态不对,该如何修复

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
5个tikv节点,3个故障
【概述】场景+问题概述
5个tikv节点,3个故障,模拟tikv故障修复过程中,故障节点启动重新加入
【背景】做过哪些操作
具体步骤如下文
【现象】业务和数据库现象
【业务影响】
副本错乱
【TiDB 版本】
5.1.1
【附件】

基础信息

MySQL [INFORMATION_SCHEMA]> select STORE_ID,ADDRESS,LEADER_COUNT,REGION_COUNT,STORE_STATE_NAME from information_schema.TIKV_STORE_STATUS order by STORE_STATE_NAME;                                                                                   +----------+---------------------+--------------+--------------+------------------+
| STORE_ID | ADDRESS             | LEADER_COUNT | REGION_COUNT | STORE_STATE_NAME |
+----------+---------------------+--------------+--------------+------------------+
|        1 | 11.45.198.108:20160 |           14 |           33 | Up               |
|        4 | 11.45.199.107:20160 |            4 |           14 | Up               |
|        5 | 11.45.199.156:20160 |            8 |           29 | Up               |
|        6 | 11.45.234.8:20160   |            6 |           26 | Up               |
|        7 | 11.45.199.106:20160 |            3 |            3 | Up               |
+----------+---------------------+--------------+--------------+------------------+
5 rows in set (0.00 sec)

MySQL [INFORMATION_SCHEMA]> SELECT distinct a.TIDB_TABLE_ID,b.DB_NAME,b.TABLE_NAME,b.REGION_ID,b.APPROXIMATE_SIZE,c.PEER_ID,c.STORE_ID,c.IS_LEADER,c.STATUS ,d.ADDRESS,d.STORE_STATE_NAME,d.VERSION,d.CAPACITY,d.AVAILABLE,d.LABEL  FROM information_schema.TABLES as a  INNER JOIN TIKV_REGION_STATUS as b  INNER JOIN TIKV_REGION_PEERS as c  INNER JOIN TIKV_STORE_STATUS as d  WHERE a.TIDB_TABLE_ID = b.TABLE_ID AND b.REGION_ID = c.REGION_ID AND c.STORE_ID = d.STORE_ID AND a.TABLE_SCHEMA='test' order by a.TIDB_TABLE_ID,b.DB_NAME,b.TABLE_NAME,b.REGION_ID,b.APPROXIMATE_SIZE,c.PEER_ID,c.STORE_ID,c.IS_LEADER,c.STATUS ,d.ADDRESS,d.STORE_STATE_NAME,d.VERSION,d.CAPACITY,d.AVAILABLE,d.LABEL asc;
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
| TIDB_TABLE_ID | DB_NAME | TABLE_NAME | REGION_ID | APPROXIMATE_SIZE | PEER_ID | STORE_ID | IS_LEADER | STATUS | ADDRESS             | STORE_STATE_NAME | VERSION | CAPACITY | AVAILABLE | LABEL |
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
|            53 | test    | sbtest6    |       124 |                1 |     180 |        5 |         0 | NORMAL | 11.45.199.156:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            53 | test    | sbtest6    |       124 |                1 |     255 |        1 |         1 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            53 | test    | sbtest6    |       124 |                1 |     299 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            54 | test    | sbtest3    |       140 |                1 |     273 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            54 | test    | sbtest3    |       140 |                1 |     303 |        5 |         1 | NORMAL | 11.45.199.156:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            54 | test    | sbtest3    |       140 |                1 |     315 |        1 |         0 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            55 | test    | sbtest2    |       148 |                1 |     182 |        5 |         1 | NORMAL | 11.45.199.156:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            55 | test    | sbtest2    |       148 |                1 |     191 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            55 | test    | sbtest2    |       148 |                1 |     210 |        1 |         0 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            56 | test    | sbtest7    |       152 |               27 |     154 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            56 | test    | sbtest7    |       152 |               27 |     155 |        4 |         0 | NORMAL | 11.45.199.107:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            56 | test    | sbtest7    |       152 |               27 |     211 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            57 | test    | sbtest10   |       144 |                1 |     193 |        1 |         0 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            57 | test    | sbtest10   |       144 |                1 |     290 |        4 |         1 | NORMAL | 11.45.199.107:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            57 | test    | sbtest10   |       144 |                1 |     304 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            58 | test    | sbtest8    |       168 |                1 |     169 |        1 |         1 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            58 | test    | sbtest8    |       168 |                1 |     226 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            58 | test    | sbtest8    |       168 |                1 |     251 |        5 |         0 | NORMAL | 11.45.199.156:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            62 | test    | sbtest5    |       156 |                1 |     257 |        4 |         1 | NORMAL | 11.45.199.107:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            62 | test    | sbtest5    |       156 |                1 |     279 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            62 | test    | sbtest5    |       156 |                1 |     316 |        1 |         0 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            63 | test    | sbtest1    |       176 |               14 |     178 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            63 | test    | sbtest1    |       176 |               14 |     224 |        5 |         0 | NORMAL | 11.45.199.156:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            63 | test    | sbtest1    |       176 |               14 |     289 |        1 |         0 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            64 | test    | sbtest4    |       172 |                1 |     208 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            64 | test    | sbtest4    |       172 |                1 |     248 |        1 |         1 | NORMAL | 11.45.198.108:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            64 | test    | sbtest4    |       172 |                1 |     293 |        5 |         0 | NORMAL | 11.45.199.156:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            71 | test    | sbtest9    |         2 |               14 |      58 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            71 | test    | sbtest9    |         2 |               14 |      92 |        4 |         0 | NORMAL | 11.45.199.107:20160 | Up               | 5.1.1   | 503.8GiB | 452.2GiB  | null  |
|            71 | test    | sbtest9    |         2 |               14 |     185 |        5 |         0 | NORMAL | 11.45.199.156:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
30 rows in set (0.01 sec)

模拟故障

|        1 | 11.45.198.108:20160 |           14 |           33 | Up               |
|        4 | 11.45.199.107:20160 |            4 |           14 | Up               |
|        5 | 11.45.199.156:20160 |            8 |           29 | Up               |

systemctl stop tikv-20160.service

查看数据分布

查看测试表region分布
[root@pd1 pd-2379]# bin/pd-ctl -u http://11.45.242.127:2379 region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length==3) }'
{"id":22,"peer_stores":[6,4,1]}
{"id":52,"peer_stores":[5,6,1]}
{"id":54,"peer_stores":[1,5,4]}
{"id":2,"peer_stores":[7,4,5]}
{"id":24,"peer_stores":[6,1,7]}
{"id":26,"peer_stores":[1,6,5]}
{"id":30,"peer_stores":[5,6,1]}
{"id":48,"peer_stores":[1,5,4]}
{"id":36,"peer_stores":[1,6,5]}
{"id":172,"peer_stores":[6,1,5]}
{"id":38,"peer_stores":[5,1,4]}
{"id":120,"peer_stores":[1,6,5]}
{"id":176,"peer_stores":[7,5,1]}
{"id":152,"peer_stores":[7,4,6]}
{"id":156,"peer_stores":[4,6,1]}
{"id":16,"peer_stores":[6,5,1]}
{"id":32,"peer_stores":[1,6,5]}
{"id":42,"peer_stores":[1,6,4]}
{"id":148,"peer_stores":[6,1,7]}
{"id":14,"peer_stores":[5,1,4]}
{"id":40,"peer_stores":[5,1,7]}
{"id":144,"peer_stores":[1,4,6]}
{"id":168,"peer_stores":[1,6,5]}
{"id":8,"peer_stores":[1,4,6]}
{"id":44,"peer_stores":[5,4,1]}
{"id":46,"peer_stores":[6,4,1]}
{"id":140,"peer_stores":[5,1,7]}
{"id":18,"peer_stores":[1,5,4]}
{"id":20,"peer_stores":[6,5,1]}
{"id":34,"peer_stores":[5,1,4]}
{"id":124,"peer_stores":[5,1,6]}

关闭调度

bin/pd-ctl -u http://11.45.242.127:2379 config set region-schedule-limit 0
bin/pd-ctl -u http://11.45.242.127:2379 config set replica-schedule-limit 0
bin/pd-ctl -u http://11.45.242.127:2379 config set leader-schedule-limit 0
bin/pd-ctl -u http://11.45.242.127:2379 config set merge-schedule-limit 0

查看Store状态

MySQL [INFORMATION_SCHEMA]> select STORE_ID,ADDRESS,LEADER_COUNT,REGION_COUNT,STORE_STATE_NAME from information_schema.TIKV_STORE_STATUS order by STORE_STATE_NAME;                                                                                   +----------+---------------------+--------------+--------------+------------------+
| STORE_ID | ADDRESS             | LEADER_COUNT | REGION_COUNT | STORE_STATE_NAME |
+----------+---------------------+--------------+--------------+------------------+
|        4 | 11.45.199.107:20160 |            5 |           19 | Disconnected     |
|        5 | 11.45.199.156:20160 |            7 |           26 | Disconnected     |
|        1 | 11.45.198.108:20160 |           13 |           33 | Disconnected     |
|        6 | 11.45.234.8:20160   |            6 |           24 | Up               |
|        7 | 11.45.199.106:20160 |            4 |            7 | Up               |
+----------+---------------------+--------------+--------------+------------------+
5 rows in set (0.00 sec)

查看检查大于等于一半副本数在故障节点上的 Region

[root@pd1 pd-2379]# bin/pd-ctl -u http://11.45.242.127:2379 region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(1,4,5) then . else empty end) | length>=$total-length) }'
{"id":26,"peer_stores":[1,6,5]}
{"id":30,"peer_stores":[5,6,1]}
{"id":48,"peer_stores":[1,5,4]}
{"id":2,"peer_stores":[7,4,5]}
{"id":172,"peer_stores":[6,1,5]}
{"id":36,"peer_stores":[1,6,5]}
{"id":120,"peer_stores":[1,6,5]}
{"id":176,"peer_stores":[7,5,1]}
{"id":38,"peer_stores":[5,1,4]}
{"id":16,"peer_stores":[6,5,1]}
{"id":32,"peer_stores":[1,6,5]}
{"id":42,"peer_stores":[1,6,4]}
{"id":156,"peer_stores":[4,6,1]}
{"id":10,"peer_stores":[1,6,5,4]}
{"id":40,"peer_stores":[5,1,7]}
{"id":50,"peer_stores":[6,1,5,4]}
{"id":144,"peer_stores":[1,4,6]}
{"id":168,"peer_stores":[1,6,5]}
{"id":14,"peer_stores":[5,1,4]}
{"id":12,"peer_stores":[1,5,6,4]}
{"id":44,"peer_stores":[5,4,1]}
{"id":46,"peer_stores":[6,4,1]}
{"id":140,"peer_stores":[5,1,7]}
{"id":8,"peer_stores":[1,4,6]}
{"id":20,"peer_stores":[6,5,1]}
{"id":28,"peer_stores":[1,5,6,4]}
{"id":34,"peer_stores":[5,1,4]}
{"id":124,"peer_stores":[5,1,6]}
{"id":18,"peer_stores":[1,5,4]}
{"id":52,"peer_stores":[5,6,1]}
{"id":54,"peer_stores":[1,5,4]}
{"id":22,"peer_stores":[6,4,1]}

移除故障节点


[root@pd1 pd-2379]# bin/pd-ctl -u http://11.45.242.127:2379 store delete 1
Success!
[root@pd1 pd-2379]# bin/pd-ctl -u http://11.45.242.127:2379 store delete 4
Success!
[root@pd1 pd-2379]# bin/pd-ctl -u http://11.45.242.127:2379 store delete 5
Success!

此场景发现删除不掉
[root@pd1 pd-2379]# bin/pd-ctl -u http://11.45.242.127:2379 store


MySQL [INFORMATION_SCHEMA]> select STORE_ID,ADDRESS,LEADER_COUNT,REGION_COUNT,STORE_STATE_NAME from information_schema.TIKV_STORE_STATUS order by STORE_STATE_NAME;
+----------+---------------------+--------------+--------------+------------------+
| STORE_ID | ADDRESS             | LEADER_COUNT | REGION_COUNT | STORE_STATE_NAME |
+----------+---------------------+--------------+--------------+------------------+
|        1 | 11.45.198.108:20160 |           13 |           33 | Offline          |
|        4 | 11.45.199.107:20160 |            5 |           19 | Offline          |
|        5 | 11.45.199.156:20160 |            7 |           26 | Offline          |
|        6 | 11.45.234.8:20160   |            6 |           24 | Up               |
|        7 | 11.45.199.106:20160 |            4 |            7 | Up               |
+----------+---------------------+--------------+--------------+------------------+
5 rows in set (0.01 sec)

开始故障恢复

停止存活节点上的TiKV服务

systemctl stop tikv-20160.service

MySQL [INFORMATION_SCHEMA]> select STORE_ID,ADDRESS,LEADER_COUNT,REGION_COUNT,STORE_STATE_NAME from information_schema.TIKV_STORE_STATUS order by STORE_STATE_NAME;
+----------+---------------------+--------------+--------------+------------------+
| STORE_ID | ADDRESS             | LEADER_COUNT | REGION_COUNT | STORE_STATE_NAME |
+----------+---------------------+--------------+--------------+------------------+
|        6 | 11.45.234.8:20160   |            6 |           24 | Disconnected     |
|        7 | 11.45.199.106:20160 |            4 |            7 | Disconnected     |
|        1 | 11.45.198.108:20160 |           13 |           33 | Offline          |
|        4 | 11.45.199.107:20160 |            5 |           19 | Offline          |
|        5 | 11.45.199.156:20160 |            7 |           26 | Offline          |
+----------+---------------------+--------------+--------------+------------------+
5 rows in set (0.01 sec)

移除故障节点上的Peer

[root@tikv1 tikv-20160]# /export/tidb-deploy/tikv-20160/bin/tikv-ctl --data-dir "/export/tidb-data/tikv-20160" --config /export/tidb-deploy/tikv-20160/conf/tikv.toml unsafe-recover remove-fail-stores -s 1,4,5 --all-regions

重启PD服务

systemctl restart pd-2379.service

启动存活节点上的TiKV服务

systemctl start tikv-20160.service

开启调度

bin/pd-ctl -u http://11.45.242.127:2379 config set region-schedule-limit 2048
bin/pd-ctl -u http://11.45.242.127:2379 config set replica-schedule-limit 64
bin/pd-ctl -u http://11.45.242.127:2379 config set leader-schedule-limit 4
bin/pd-ctl -u http://11.45.242.127:2379 config set merge-schedule-limit 8

验证数据

[root@pd1 pd-2379]# bin/pd-ctl -u http://11.45.242.127:2379 region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length as $total | map(if .==(1,4,5) then . else empty end)) }'
{"id":8,"peer_stores":[6]}
{"id":12,"peer_stores":[6]}
{"id":32,"peer_stores":[6]}
{"id":176,"peer_stores":[7]}
{"id":26,"peer_stores":[6]}
{"id":46,"peer_stores":[6]}
{"id":50,"peer_stores":[6]}
{"id":156,"peer_stores":[6]}
{"id":172,"peer_stores":[6]}
{"id":140,"peer_stores":[7]}
{"id":28,"peer_stores":[6]}
{"id":30,"peer_stores":[6]}
{"id":36,"peer_stores":[6]}
{"id":42,"peer_stores":[6]}
{"id":124,"peer_stores":[6]}
{"id":20,"peer_stores":[6]}
{"id":40,"peer_stores":[7]}
{"id":168,"peer_stores":[6]}
{"id":34,"peer_stores":[5,1,4]}
{"id":48,"peer_stores":[1,5,4]}
{"id":144,"peer_stores":[6]}
{"id":22,"peer_stores":[6]}
{"id":38,"peer_stores":[5,1,4]}
{"id":44,"peer_stores":[5,4,1]}
{"id":54,"peer_stores":[1,5,4]}
{"id":152,"peer_stores":[7,6]}
{"id":2,"peer_stores":[7]}
{"id":16,"peer_stores":[6]}
{"id":18,"peer_stores":[1,5,4]}
{"id":120,"peer_stores":[6]}
{"id":148,"peer_stores":[6,7]}
{"id":10,"peer_stores":[6]}
{"id":14,"peer_stores":[5,1,4]}
{"id":24,"peer_stores":[6,7]}
{"id":52,"peer_stores":[6]}

验证数据可用性

MySQL [INFORMATION_SCHEMA]> SELECT distinct a.TIDB_TABLE_ID,b.DB_NAME,b.TABLE_NAME,b.REGION_ID,b.APPROXIMATE_SIZE,c.PEER_ID,c.STORE_ID,c.IS_LEADER,c.STATUS ,d.ADDRESS,d.STORE_STATE_NAME,d.VERSION,d.CAPACITY,d.AVAILABLE,d.LABEL  FROM information_schema.TABLES as a  INNER JOIN TIKV_REGION_STATUS as b  INNER JOIN TIKV_REGION_PEERS as c  INNER JOIN TIKV_STORE_STATUS as d  WHERE a.TIDB_TABLE_ID = b.TABLE_ID AND b.REGION_ID = c.REGION_ID AND c.STORE_ID = d.STORE_ID AND a.TABLE_SCHEMA='test' order by a.TIDB_TABLE_ID,b.DB_NAME,b.TABLE_NAME,b.REGION_ID,b.APPROXIMATE_SIZE,c.PEER_ID,c.STORE_ID,c.IS_LEADER,c.STATUS ,d.ADDRESS,d.STORE_STATE_NAME,d.VERSION,d.CAPACITY,d.AVAILABLE,d.LABEL asc;
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
| TIDB_TABLE_ID | DB_NAME | TABLE_NAME | REGION_ID | APPROXIMATE_SIZE | PEER_ID | STORE_ID | IS_LEADER | STATUS | ADDRESS             | STORE_STATE_NAME | VERSION | CAPACITY | AVAILABLE | LABEL |
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
|            53 | test    | sbtest6    |       124 |                1 |     299 |        6 |         1 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            54 | test    | sbtest3    |       140 |                1 |     330 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            55 | test    | sbtest2    |       148 |                1 |     191 |        6 |         1 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            55 | test    | sbtest2    |       148 |                1 |     322 |        7 |         0 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            56 | test    | sbtest7    |       152 |                1 |     154 |        7 |         0 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            56 | test    | sbtest7    |       152 |                1 |     211 |        6 |         1 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            57 | test    | sbtest10   |       144 |                1 |     304 |        6 |         1 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            58 | test    | sbtest8    |       168 |                1 |     226 |        6 |         1 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            62 | test    | sbtest5    |       156 |                1 |     279 |        6 |         1 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            63 | test    | sbtest1    |       176 |                1 |     178 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            64 | test    | sbtest4    |       172 |                1 |     208 |        6 |         1 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            71 | test    | sbtest9    |         2 |                1 |      58 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
12 rows in set (0.01 sec)

MySQL [INFORMATION_SCHEMA]> select STORE_ID,ADDRESS,LEADER_COUNT,REGION_COUNT,STORE_STATE_NAME from information_schema.TIKV_STORE_STATUS order by STORE_STATE_NAME;                                                                                   +----------+---------------------+--------------+--------------+------------------+
| STORE_ID | ADDRESS             | LEADER_COUNT | REGION_COUNT | STORE_STATE_NAME |
+----------+---------------------+--------------+--------------+------------------+
|        5 | 11.45.199.156:20160 |            0 |            7 | Offline          |
|        1 | 11.45.198.108:20160 |            0 |            7 | Offline          |
|        4 | 11.45.199.107:20160 |            0 |            7 | Offline          |
|        6 | 11.45.234.8:20160   |           24 |           24 | Up               |
|        7 | 11.45.199.106:20160 |            4 |            7 | Up               |
+----------+---------------------+--------------+--------------+------------------+
5 rows in set (0.00 sec)

这时候故障Store恢复了

MySQL [INFORMATION_SCHEMA]> SELECT distinct a.TIDB_TABLE_ID,b.DB_NAME,b.TABLE_NAME,b.REGION_ID,b.APPROXIMATE_SIZE,c.PEER_ID,c.STORE_ID,c.IS_LEADER,c.STATUS ,d.ADDRESS,d.STORE_STATE_NAME,d.VERSION,d.CAPACITY,d.AVAILABLE,d.LABEL  FROM information_schema.TABLES as a  INNER JOIN TIKV_REGION_STATUS as b  INNER JOIN TIKV_REGION_PEERS as c  INNER JOIN TIKV_STORE_STATUS as d  WHERE a.TIDB_TABLE_ID = b.TABLE_ID AND b.REGION_ID = c.REGION_ID AND c.STORE_ID = d.STORE_ID AND a.TABLE_SCHEMA='test' order by a.TIDB_TABLE_ID,b.DB_NAME,b.TABLE_NAME,b.REGION_ID,b.APPROXIMATE_SIZE,c.PEER_ID,c.STORE_ID,c.IS_LEADER,c.STATUS ,d.ADDRESS,d.STORE_STATE_NAME,d.VERSION,d.CAPACITY,d.AVAILABLE,d.LABEL asc;
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
| TIDB_TABLE_ID | DB_NAME | TABLE_NAME | REGION_ID | APPROXIMATE_SIZE | PEER_ID | STORE_ID | IS_LEADER | STATUS | ADDRESS             | STORE_STATE_NAME | VERSION | CAPACITY | AVAILABLE | LABEL |
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
|            53 | test    | sbtest6    |       124 |                3 |     299 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            53 | test    | sbtest6    |       124 |                3 |   42007 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            54 | test    | sbtest3    |       140 |                1 |     330 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            54 | test    | sbtest3    |       140 |                1 |   45170 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            55 | test    | sbtest2    |       148 |                4 |     191 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            55 | test    | sbtest2    |       148 |                4 |     322 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            56 | test    | sbtest7    |       152 |                4 |     154 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            56 | test    | sbtest7    |       152 |                4 |     211 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            57 | test    | sbtest10   |       144 |                1 |     193 |        1 |         0 | NORMAL | 11.45.198.108:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            57 | test    | sbtest10   |       144 |                1 |     290 |        4 |         1 | NORMAL | 11.45.199.107:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            57 | test    | sbtest10   |       144 |                1 |     304 |        6 |         0 | DOWN   | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            58 | test    | sbtest8    |       168 |                4 |     169 |        1 |         1 | NORMAL | 11.45.198.108:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            58 | test    | sbtest8    |       168 |                4 |     226 |        6 |         0 | DOWN   | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            58 | test    | sbtest8    |       168 |                4 |     251 |        5 |         0 | NORMAL | 11.45.199.156:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            62 | test    | sbtest5    |       156 |                3 |     257 |        4 |         0 | NORMAL | 11.45.199.107:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            62 | test    | sbtest5    |       156 |                3 |     279 |        6 |         0 | DOWN   | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            62 | test    | sbtest5    |       156 |                3 |     316 |        1 |         1 | NORMAL | 11.45.198.108:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            63 | test    | sbtest1    |       176 |                5 |     178 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            63 | test    | sbtest1    |       176 |                5 |   45171 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            64 | test    | sbtest4    |       172 |                3 |     208 |        6 |         0 | DOWN   | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            64 | test    | sbtest4    |       172 |                3 |     248 |        1 |         1 | NORMAL | 11.45.198.108:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            64 | test    | sbtest4    |       172 |                3 |     293 |        5 |         0 | NORMAL | 11.45.199.156:20160 | Offline          | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            71 | test    | sbtest9    |         2 |                3 |      58 |        7 |         1 | NORMAL | 11.45.199.106:20160 | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
|            71 | test    | sbtest9    |         2 |                3 |   45091 |        6 |         0 | NORMAL | 11.45.234.8:20160   | Up               | 5.1.1   | 503.8GiB | 452.3GiB  | null  |
+---------------+---------+------------+-----------+------------------+---------+----------+-----------+--------+---------------------+------------------+---------+----------+-----------+-------+
24 rows in set (0.01 sec)
1 个赞

参考下 SOP 操作手册吧,如果是实验环境,可以重建实验试试…

这种情况如果遇到了 该怎么修复呢?

如果是 5 节点,5 副本,最多只能宕机2 个节点,宕机3 个节点就无法正常的提供服务了
如果是 3 节点,3 副本,最多只能宕机1 个节点,宕机 2 个节点就无法正常提供服务了

如果出现节点宕机的问题,优先考虑补充节点,然后PD 会自动调度补齐副本数

如果是副本丢失了,就参考 SOP 的方案来修复,这样解答能理解么?:grinning:

怎么能防止在修复的过程中故障节点不会再加进来呢?


参考这个处理策略吧,如果觉得时间处理范围内,还无法启动故障节点,直接下线处理,重新获取新的资源组件新节点加入集群做为补充方案

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。