TiKV下线过程一直处于Pending Offline

【 TiDB 使用环境】测试环境
【 TiDB 版本】v5.4.0
【遇到的问题】
TiKV下线过程一直处于Pending Offline,持续两天了
【复现路径】
【问题现象及影响】
集群新增了三个TiKV节点,下线三个旧TiKv节点,一直Pending Offlineregion_countleader_count已经不变化了,还挺奇怪的






这个在社区有很多文章,帖子都有类似的情况。

可以参考下: 专栏 - TiKV缩容不掉如何解决? | TiDB 社区

pd-ctl region store xxxx 这样看下这几个store上region信息

我分析下吧,看看是哪个region卡住了

上面图片里有

pd-ctl region store xxxx 不是pd-ctl store xxxx

那篇文章也有你说的这种情况,你可以试着跟着操作下

1 个赞

抱歉:sweat_smile:

region store 688833
{
  "count": 2,
  "regions": [
    {
      "id": 3055790,
      "start_key": "7480000000000001FF635F698000000000FF0000020381405A1AFF5BF342000141334FFF59524C4247FF4434FF58324D500000FD03FF800000006114CF40FF0141464E00000000FF00FA03D000000001FFC8033B0000000000FA",
      "end_key": "7480000000000001FF635F698000000000FF000003038140332CFFC84C200001413248FF4335384B56FF5050FF354F4F480000FD01FF3131342D31393630FFFF3438382D323632FF31FF303239000000FF0000FA0000000000FA",
      "epoch": {
        "conf_ver": 76,
        "version": 422
      },
      "peers": [
        {
          "id": 3178122,
          "store_id": 688832,
          "role_name": "Voter"
        },
        {
          "id": 6023001,
          "store_id": 688833,
          "role_name": "Voter"
        },
        {
          "id": 26062629,
          "store_id": 4,
          "role_name": "Voter"
        },
        {
          "id": 26062679,
          "store_id": 5,
          "role_name": "Voter"
        }
      ],
      "leader": {
        "id": 26062629,
        "store_id": 4,
        "role_name": "Voter"
      },
      "written_bytes": 0,
      "read_bytes": 0,
      "written_keys": 0,
      "read_keys": 0,
      "approximate_size": 95,
      "approximate_keys": 889043
    },
    {
      "id": 667841,
      "start_key": "7480000000000001FF635F698000000000FF00000703A0000000FF0046A09003800000FF006101786303A000FF00000046A0900000FD",
      "end_key": "7480000000000001FF635F698000000000FF00000703A0000000FF015403DD03800000FF00610B77FE03A000FF0000015403DD0000FD",
      "epoch": {
        "conf_ver": 92,
        "version": 423
      },
      "peers": [
        {
          "id": 3377184,
          "store_id": 688833,
          "role_name": "Voter"
        },
        {
          "id": 3476493,
          "store_id": 688834,
          "role_name": "Voter"
        },
        {
          "id": 24145153,
          "store_id": 5,
          "role_name": "Voter"
        },
        {
          "id": 26062760,
          "store_id": 4,
          "role_name": "Voter"
        }
      ],
      "leader": {
        "role_name": "Voter"
      },
      "written_bytes": 0,
      "read_bytes": 0,
      "written_keys": 0,
      "read_keys": 0,
      "approximate_size": 0,
      "approximate_keys": 0
    }
  ]
}
» region store 688834
{
  "count": 1,
  "regions": [
    {
      "id": 667841,
      "start_key": "7480000000000001FF635F698000000000FF00000703A0000000FF0046A09003800000FF006101786303A000FF00000046A0900000FD",
      "end_key": "7480000000000001FF635F698000000000FF00000703A0000000FF015403DD03800000FF00610B77FE03A000FF0000015403DD0000FD",
      "epoch": {
        "conf_ver": 92,
        "version": 423
      },
      "peers": [
        {
          "id": 3377184,
          "store_id": 688833,
          "role_name": "Voter"
        },
        {
          "id": 3476493,
          "store_id": 688834,
          "role_name": "Voter"
        },
        {
          "id": 24145153,
          "store_id": 5,
          "role_name": "Voter"
        },
        {
          "id": 26062760,
          "store_id": 4,
          "role_name": "Voter"
        }
      ],
      "leader": {
        "role_name": "Voter"
      },
      "written_bytes": 0,
      "read_bytes": 0,
      "written_keys": 0,
      "read_keys": 0,
      "approximate_size": 0,
      "approximate_keys": 0
    }
  ]
}
» region store 688835
{
  "count": 1,
  "regions": [
    {
      "id": 408655,
      "start_key": "7480000000000001FF635F72D800000003FFAA10290000000000FA",
      "end_key": "7480000000000001FF635F72E000000000FF3F1BCE0000000000FA",
      "epoch": {
        "conf_ver": 52,
        "version": 384
      },
      "peers": [
        {
          "id": 3476343,
          "store_id": 688835,
          "role_name": "Voter"
        },
        {
          "id": 3476542,
          "store_id": 688832,
          "role_name": "Voter"
        },
        {
          "id": 23951787,
          "store_id": 1,
          "role_name": "Voter"
        },
        {
          "id": 26063554,
          "store_id": 4,
          "role_name": "Voter"
        }
      ],
      "leader": {
        "role_name": "Voter"
      },
      "written_bytes": 0,
      "read_bytes": 0,
      "written_keys": 0,
      "read_keys": 0,
      "approximate_size": 0,
      "approximate_keys": 0
    }
  ]
}

参考楼上发的吧,对于leader有store id的试试 pd-ctl add operator,添加相关调度能否删除peer、转移leder。对于leader没有store的可能就得特殊处理了

1 个赞

嗯嗯,我看看,谢谢


这几个region没有leader,后台pd这个操作好像一直有,但是不成功

跟上面文章里的一样了,选不出leader来,这些operator 卡住了


从这里怎么判断出该region是空的啊

记录一下吧,region3055790有四个副本,删掉在store688833上的副本,执行完,发现没起作用,查看该region的operater,发现超时了

» region 3055790
{
  "id": 3055790,
  "start_key": "7480000000000001FF635F698000000000FF0000020381405A1AFF5BF342000141334FFF59524C4247FF4434FF58324D500000FD03FF800000006114CF40FF0141464E00000000FF00FA03D000000001FFC8033B0000000000FA",
  "end_key": "7480000000000001FF635F698000000000FF000003038140332CFFC84C200001413248FF4335384B56FF5050FF354F4F480000FD01FF3131342D31393630FFFF3438382D323632FF31FF303239000000FF0000FA0000000000FA",
  "epoch": {
    "conf_ver": 76,
    "version": 422
  },
  "peers": [
    {
      "id": 3178122,
      "store_id": 688832,
      "role_name": "Voter"
    },
    {
      "id": 6023001,
      "store_id": 688833,
      "role_name": "Voter"
    },
    {
      "id": 26062629,
      "store_id": 4,
      "role_name": "Voter"
    },
    {
      "id": 26062679,
      "store_id": 5,
      "role_name": "Voter"
    }
  ],
  "leader": {
    "id": 26062629,
    "store_id": 4,
    "role_name": "Voter"
  },
  "written_bytes": 0,
  "read_bytes": 0,
  "written_keys": 0,
  "read_keys": 0,
  "approximate_size": 95,
  "approximate_keys": 889043
}
» operator add remove-peer 3055790 688833
Success!
» region 3055790
{
  "id": 3055790,
  "start_key": "7480000000000001FF635F698000000000FF0000020381405A1AFF5BF342000141334FFF59524C4247FF4434FF58324D500000FD03FF800000006114CF40FF0141464E00000000FF00FA03D000000001FFC8033B0000000000FA",
  "end_key": "7480000000000001FF635F698000000000FF000003038140332CFFC84C200001413248FF4335384B56FF5050FF354F4F480000FD01FF3131342D31393630FFFF3438382D323632FF31FF303239000000FF0000FA0000000000FA",
  "epoch": {
    "conf_ver": 76,
    "version": 422
  },
  "peers": [
    {
      "id": 3178122,
      "store_id": 688832,
      "role_name": "Voter"
    },
    {
      "id": 6023001,
      "store_id": 688833,
      "role_name": "Voter"
    },
    {
      "id": 26062629,
      "store_id": 4,
      "role_name": "Voter"
    },
    {
      "id": 26062679,
      "store_id": 5,
      "role_name": "Voter"
    }
  ],
  "leader": {
    "id": 26062629,
    "store_id": 4,
    "role_name": "Voter"
  },
  "written_bytes": 0,
  "read_bytes": 0,
  "written_keys": 0,
  "read_keys": 0,
  "approximate_size": 95,
  "approximate_keys": 889043
}

» operator add remove-peer 3055790 688833
Failed! [500] "failed to add operator, maybe already have one"
» operator check 3055790
"status: TIMEOUT, operator: admin-remove-peer {rm peer: store [688833]} (kind:admin,region, region:3055790(422,76), createAt:2022-04-29 16:18:33.031434787 +0800 CST m=+1818139.834255777, startAt:2022-04-29 16:18:33.031564436 +0800 CST m=+1818139.834385426, currentStep:0, steps:[remove peer on store 688833]) timeout"

看那个frames那一个,里面是null,如果不是空region的话,frames会显示这个region存储的库表信息

1 个赞

原来如此,谢谢

有尝试过用pdctl 直接删掉store吗

还没有呢,这个是另一个同事操作的,我现在不太方便操作,我先分析一下

如果没有问题了,记得标记一下