【 TiDB 使用环境】测试环境
【 TiDB 版本】v5.4.0
【遇到的问题】
TiKV下线过程一直处于Pending Offline,持续两天了
【复现路径】无
【问题现象及影响】
集群新增了三个TiKV节点,下线三个旧TiKv节点,一直Pending Offline
,region_count
和leader_count
已经不变化了,还挺奇怪的
这个在社区有很多文章,帖子都有类似的情况。
可以参考下: 专栏 - TiKV缩容不掉如何解决? | TiDB 社区
pd-ctl region store xxxx 这样看下这几个store上region信息
我分析下吧,看看是哪个region卡住了
上面图片里有
pd-ctl region store xxxx 不是pd-ctl store xxxx
那篇文章也有你说的这种情况,你可以试着跟着操作下
抱歉
region store 688833
{
"count": 2,
"regions": [
{
"id": 3055790,
"start_key": "7480000000000001FF635F698000000000FF0000020381405A1AFF5BF342000141334FFF59524C4247FF4434FF58324D500000FD03FF800000006114CF40FF0141464E00000000FF00FA03D000000001FFC8033B0000000000FA",
"end_key": "7480000000000001FF635F698000000000FF000003038140332CFFC84C200001413248FF4335384B56FF5050FF354F4F480000FD01FF3131342D31393630FFFF3438382D323632FF31FF303239000000FF0000FA0000000000FA",
"epoch": {
"conf_ver": 76,
"version": 422
},
"peers": [
{
"id": 3178122,
"store_id": 688832,
"role_name": "Voter"
},
{
"id": 6023001,
"store_id": 688833,
"role_name": "Voter"
},
{
"id": 26062629,
"store_id": 4,
"role_name": "Voter"
},
{
"id": 26062679,
"store_id": 5,
"role_name": "Voter"
}
],
"leader": {
"id": 26062629,
"store_id": 4,
"role_name": "Voter"
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 95,
"approximate_keys": 889043
},
{
"id": 667841,
"start_key": "7480000000000001FF635F698000000000FF00000703A0000000FF0046A09003800000FF006101786303A000FF00000046A0900000FD",
"end_key": "7480000000000001FF635F698000000000FF00000703A0000000FF015403DD03800000FF00610B77FE03A000FF0000015403DD0000FD",
"epoch": {
"conf_ver": 92,
"version": 423
},
"peers": [
{
"id": 3377184,
"store_id": 688833,
"role_name": "Voter"
},
{
"id": 3476493,
"store_id": 688834,
"role_name": "Voter"
},
{
"id": 24145153,
"store_id": 5,
"role_name": "Voter"
},
{
"id": 26062760,
"store_id": 4,
"role_name": "Voter"
}
],
"leader": {
"role_name": "Voter"
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 0,
"approximate_keys": 0
}
]
}
» region store 688834
{
"count": 1,
"regions": [
{
"id": 667841,
"start_key": "7480000000000001FF635F698000000000FF00000703A0000000FF0046A09003800000FF006101786303A000FF00000046A0900000FD",
"end_key": "7480000000000001FF635F698000000000FF00000703A0000000FF015403DD03800000FF00610B77FE03A000FF0000015403DD0000FD",
"epoch": {
"conf_ver": 92,
"version": 423
},
"peers": [
{
"id": 3377184,
"store_id": 688833,
"role_name": "Voter"
},
{
"id": 3476493,
"store_id": 688834,
"role_name": "Voter"
},
{
"id": 24145153,
"store_id": 5,
"role_name": "Voter"
},
{
"id": 26062760,
"store_id": 4,
"role_name": "Voter"
}
],
"leader": {
"role_name": "Voter"
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 0,
"approximate_keys": 0
}
]
}
» region store 688835
{
"count": 1,
"regions": [
{
"id": 408655,
"start_key": "7480000000000001FF635F72D800000003FFAA10290000000000FA",
"end_key": "7480000000000001FF635F72E000000000FF3F1BCE0000000000FA",
"epoch": {
"conf_ver": 52,
"version": 384
},
"peers": [
{
"id": 3476343,
"store_id": 688835,
"role_name": "Voter"
},
{
"id": 3476542,
"store_id": 688832,
"role_name": "Voter"
},
{
"id": 23951787,
"store_id": 1,
"role_name": "Voter"
},
{
"id": 26063554,
"store_id": 4,
"role_name": "Voter"
}
],
"leader": {
"role_name": "Voter"
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 0,
"approximate_keys": 0
}
]
}
参考楼上发的吧,对于leader有store id的试试 pd-ctl add operator,添加相关调度能否删除peer、转移leder。对于leader没有store的可能就得特殊处理了
1 个赞
嗯嗯,我看看,谢谢
跟上面文章里的一样了,选不出leader来,这些operator 卡住了
记录一下吧,region3055790
有四个副本,删掉在store688833
上的副本,执行完,发现没起作用,查看该region的operater,发现超时了
» region 3055790
{
"id": 3055790,
"start_key": "7480000000000001FF635F698000000000FF0000020381405A1AFF5BF342000141334FFF59524C4247FF4434FF58324D500000FD03FF800000006114CF40FF0141464E00000000FF00FA03D000000001FFC8033B0000000000FA",
"end_key": "7480000000000001FF635F698000000000FF000003038140332CFFC84C200001413248FF4335384B56FF5050FF354F4F480000FD01FF3131342D31393630FFFF3438382D323632FF31FF303239000000FF0000FA0000000000FA",
"epoch": {
"conf_ver": 76,
"version": 422
},
"peers": [
{
"id": 3178122,
"store_id": 688832,
"role_name": "Voter"
},
{
"id": 6023001,
"store_id": 688833,
"role_name": "Voter"
},
{
"id": 26062629,
"store_id": 4,
"role_name": "Voter"
},
{
"id": 26062679,
"store_id": 5,
"role_name": "Voter"
}
],
"leader": {
"id": 26062629,
"store_id": 4,
"role_name": "Voter"
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 95,
"approximate_keys": 889043
}
» operator add remove-peer 3055790 688833
Success!
» region 3055790
{
"id": 3055790,
"start_key": "7480000000000001FF635F698000000000FF0000020381405A1AFF5BF342000141334FFF59524C4247FF4434FF58324D500000FD03FF800000006114CF40FF0141464E00000000FF00FA03D000000001FFC8033B0000000000FA",
"end_key": "7480000000000001FF635F698000000000FF000003038140332CFFC84C200001413248FF4335384B56FF5050FF354F4F480000FD01FF3131342D31393630FFFF3438382D323632FF31FF303239000000FF0000FA0000000000FA",
"epoch": {
"conf_ver": 76,
"version": 422
},
"peers": [
{
"id": 3178122,
"store_id": 688832,
"role_name": "Voter"
},
{
"id": 6023001,
"store_id": 688833,
"role_name": "Voter"
},
{
"id": 26062629,
"store_id": 4,
"role_name": "Voter"
},
{
"id": 26062679,
"store_id": 5,
"role_name": "Voter"
}
],
"leader": {
"id": 26062629,
"store_id": 4,
"role_name": "Voter"
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 95,
"approximate_keys": 889043
}
» operator add remove-peer 3055790 688833
Failed! [500] "failed to add operator, maybe already have one"
» operator check 3055790
"status: TIMEOUT, operator: admin-remove-peer {rm peer: store [688833]} (kind:admin,region, region:3055790(422,76), createAt:2022-04-29 16:18:33.031434787 +0800 CST m=+1818139.834255777, startAt:2022-04-29 16:18:33.031564436 +0800 CST m=+1818139.834385426, currentStep:0, steps:[remove peer on store 688833]) timeout"
看那个frames那一个,里面是null,如果不是空region的话,frames会显示这个region存储的库表信息
1 个赞
原来如此,谢谢
有尝试过用pdctl 直接删掉store吗
还没有呢,这个是另一个同事操作的,我现在不太方便操作,我先分析一下
如果没有问题了,记得标记一下