tikv扩缩容后tikv节点无region

【概述】场景+问题概述
tikv节点异常,先将其从集群中删除(参考的是tikv缩容文档),然后再将其重新加入集群(参考的是tikv扩容文档)后进程正常启动,但是无法同步数据

【背景】做过哪些操作
1、从集群中删除tikv节点,涉及命令如下
/opt/tidb/deploy/bin/pd-ctl -u “http://10.18.62.104:2379” -d store
/opt/tidb/deploy/bin/pd-ctl -u “http://10.18.62.104:2379” -d store delete 7708753
/opt/tidb/deploy/bin/pd-ctl -u “http://10.18.62.104:2379” -d store 7708753
查看期状态为Tombstone后,执行以下命令
ansible-playbook stop.yml -l TiKV3
将该节点从inventory.ini文件中注释掉,然后执行以下命令
ansible-playbook rolling_update_monitor.yml --tags=prometheus
在相关tikv节点上将数据目录给清空

2、将该节点重新加入集群
将该节点加入inventory.ini文件,然后执行以下命令
ansible-playbook bootstrap.yml -l TiKV3
ansible-playbook deploy.yml -l TiKV3
ansible-playbook start.yml -l TiKV3
/opt/tidb/deploy/bin/pd-ctl -u “http://10.18.62.104:2379” -d store

【现象】业务和数据库现象
tikv节点删除和加入集群操作均正常
但是新加入的节点region数一直显示为0

【业务影响】

【TiDB 版本】
v3.0.0

【附件】


1.麻烦先确认下集群中其他节点状态都是正常的;
2.若集群整体状态都是正常的,可以先参考下面这个帖子排查下 region 均衡问题:
https://docs.pingcap.com/zh/tidb/v3.0/pd-scheduling-best-practices#leaderregion-分布不均衡

集群的其他节点状态是正常的
{
“count”: 5,
“stores”: [
{
“store”: {
“id”: 6494696,
“address”: “10.18.62.105:20160”,
“labels”: [
{
“key”: “host”,
“value”: “tikv2”
}
],
“version”: “3.0.0”,
“state_name”: “Up”
},
“status”: {
“capacity”: “3.6 TiB”,
“available”: “2.5 TiB”,
“leader_count”: 34805,
“leader_weight”: 1,
“leader_score”: 520435,
“leader_size”: 520435,
“region_count”: 78244,
“region_weight”: 1,
“region_score”: 1561109,
“region_size”: 1561109,
“start_ts”: “2021-05-06T16:38:34+08:00”,
“last_heartbeat_ts”: “2021-06-29T13:12:22.117992506+08:00”,
“uptime”: “1292h33m48.117992506s”
}
},
{
“store”: {
“id”: 7709129,
“address”: “10.18.62.107:20160”,
“labels”: [
{
“key”: “host”,
“value”: “tikv3”
}
],
“version”: “3.0.0”,
“state_name”: “Up”
},
“status”: {
“capacity”: “3.6 TiB”,
“available”: “1.4 TiB”,
“leader_weight”: 1,
“region_weight”: 1,
“region_score”: 40599298.80440994,
“start_ts”: “2021-06-29T10:48:11+08:00”,
“last_heartbeat_ts”: “2021-06-29T13:12:23.182619544+08:00”,
“uptime”: “2h24m12.182619544s”
}
},
{
“store”: {
“id”: 5428243,
“address”: “10.18.62.104:20160”,
“labels”: [
{
“key”: “host”,
“value”: “tikv1”
}
],
“version”: “3.0.0”,
“state_name”: “Up”
},
“status”: {
“capacity”: “3.6 TiB”,
“available”: “3.3 TiB”,
“leader_count”: 18262,
“leader_weight”: 1,
“leader_score”: 520366,
“leader_size”: 520366,
“region_count”: 74123,
“region_weight”: 1,
“region_score”: 1561290,
“region_size”: 1561290,
“start_ts”: “2021-04-09T09:24:07+08:00”,
“last_heartbeat_ts”: “2021-06-29T13:12:22.032905467+08:00”,
“uptime”: “1947h48m15.032905467s”
}
},
{
“store”: {
“id”: 6209446,
“address”: “10.18.62.108:20160”,
“labels”: [
{
“key”: “host”,
“value”: “tikv4”
}
],
“version”: “3.0.0”,
“state_name”: “Up”
},
“status”: {
“capacity”: “5.8 TiB”,
“available”: “5.5 TiB”,
“leader_count”: 32807,
“leader_weight”: 1,
“leader_score”: 520409,
“leader_size”: 520409,
“region_count”: 76542,
“region_weight”: 1,
“region_score”: 1561237,
“region_size”: 1561237,
“start_ts”: “2021-04-26T15:36:14+08:00”,
“last_heartbeat_ts”: “2021-06-29T13:12:20.273131326+08:00”,
“uptime”: “1533h36m6.273131326s”
}
},
{
“store”: {
“id”: 5516322,
“address”: “10.18.62.50:20160”,
“labels”: [
{
“key”: “host”,
“value”: “tikv5”
}
],
“version”: “3.0.0”,
“state_name”: “Up”
},
“status”: {
“capacity”: “3.6 TiB”,
“available”: “2.2 TiB”,
“leader_count”: 14638,
“leader_weight”: 1,
“leader_score”: 520434,
“leader_size”: 520434,
“region_count”: 72627,
“region_weight”: 1,
“region_score”: 1561296,
“region_size”: 1561296,
“start_ts”: “2021-01-06T14:55:29+08:00”,
“last_heartbeat_ts”: “2021-06-29T13:12:23.285644696+08:00”,
“uptime”: “4174h16m54.285644696s”
}
}
]
}
看输出region leader的分布是比较均匀的
现在的异常是新加入的tikv5节点没有region

从 store 的结果上看 tikv 5 region 数量是 72627 ,region leader 数量是 14638,没有什么问题。

说错了,是tikv3,目前它是新添加的,然后region一直显示为0

麻烦先参考下上面提供的链接排查下,里面有很详细的排查思路。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。