关于如何清理空region及region没有leader问题

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
【概述】场景+问题概述
发现线上有些不属于任何表的region 这些region还有key 有的还挺大 但是不属于任何表


其中有个region=1179561 是空的 而且没有leader 很奇怪

查看tikv日志:
[2021/12/28 14:42:01.894 +08:00] [INFO] [pd.rs:829] [“try to merge”] [merge=“target { id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA en
d_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 stor
e_id: 1498967 } peers { id: 1663031 store_id: 1 } }”] [region_id=1179609]
[2021/12/28 14:42:01.894 +08:00] [WARN] [peer.rs:2982] [“failed to propose merge”] [error_code=KV:Raftstore:Unknown] [err=“"[components/raftstore/src/store/fsm/peer.
rs:2816]: target region not matched, skip proposing: id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F7280
00000001FF49F0D80000000000FA region_epoch { conf_ver: 188 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 16630
31 store_id: 1 } != id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA regi
on_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 }"”] [message=“he
ader { region_id: 1179609 peer { id: 1520688 store_id: 1498967 } region_epoch { conf_ver: 10 version: 35924 } } admin_request { cmd_type: PrepareMerge prepare_merge {
target { id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch {
conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 } } } }”] [peer_id=1520688]
[region_id=1179609]
[2021/12/28 14:42:07.074 +08:00] [INFO] [pd.rs:829] [“try to merge”] [merge=“target { id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 } }”] [region_id=1179609]
[2021/12/28 14:42:07.074 +08:00] [WARN] [peer.rs:2982] [“failed to propose merge”] [error_code=KV:Raftstore:Unknown] [err=“"[components/raftstore/src/store/fsm/peer.rs:2816]: target region not matched, skip proposing: id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 188 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 } != id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 }"”] [message=“header { region_id: 1179609 peer { id: 1520688 store_id: 1498967 } region_epoch { conf_ver: 10 version: 35924 } } admin_request { cmd_type: PrepareMerge prepare_merge { target { id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 } } } }”] [peer_id=1520688] [region_id=1179609]


看起来是说他在尝试合并但是失败了一直在刷重试合并日志 开启过跨表合并了config set enable-cross-table-merge true也合并不了

[2021/12/28 14:47:38.420 +08:00] [INFO] [peer.rs:2204] [“target region still not catch up, skip.”] [exist_region=“id: 1122545 start_key: 748000000000003EFFB000000000000000F8 end_key: 748000000000003EFFB300000000000000F8 region_epoch { conf_ver: 172 version: 33566 } peers { id: 1122548 store_id: 3 } peers { id: 1509273 store_id: 1498967 } peers { id: 1663023 store_id: 1 }”] [target_region=“id: 1122545 start_key: 748000000000003EFFB000000000000000F8 end_key: 748000000000003EFFB300000000000000F8 region_epoch { conf_ver: 173 version: 33566 } peers { id: 1122548 store_id: 3 } peers { id: 1509273 store_id: 1498967 } peers { id: 1663023 store_id: 1 }”] [peer_id=1904388] [region_id=1122533]
[2021/12/28 14:47:45.469 +08:00] [INFO] [pd.rs:829] [“try to merge”] [merge=“target { id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 } }”] [region_id=1179557]
[2021/12/28 14:47:45.469 +08:00] [WARN] [peer.rs:2982] [“failed to propose merge”] [error_code=KV:Raftstore:Unknown] [err=“"[components/raftstore/src/store/fsm/peer.rs:2816]: target region not matched, skip proposing: id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 188 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 } != id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 }"”] [message=“header { region_id: 1179557 peer { id: 1855133 store_id: 1498967 } region_epoch { conf_ver: 13 version: 35501 } } admin_request { cmd_type: PrepareMerge prepare_merge { target { id: 1179561 start_key: 7480000000000043FFCC5F728000000001FF40570F0000000000FA end_key: 7480000000000043FFCC5F728000000001FF49F0D80000000000FA region_epoch { conf_ver: 189 version: 35378 } peers { id: 1179564 store_id: 3 } peers { id: 1508809 store_id: 1498967 } peers { id: 1663031 store_id: 1 } } } }”] [peer_id=1855133] [region_id=1179557]

看告警平台有这个告警


查看了官方文档

tidb日志有这个报错

[2021/12/31 14:48:17.320 +08:00] [ERROR] [gc_worker.go:192] [“[gc worker] runGCJob”] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 14:54:54.159 +08:00] [INFO] [gc_worker.go:266] [“[gc worker] starts the whole job”] [uuid=5f7b89d47b40014] [safePoint=430160450449833984] [concurrency=4]
[2021/12/31 14:54:54.160 +08:00] [INFO] [gc_worker.go:956] [“[gc worker] start resolve locks”] [uuid=5f7b89d47b40014] [safePoint=430160450449833984] [concurrency=4]
[2021/12/31 14:55:54.142 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 14:56:54.145 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 14:57:54.142 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 14:58:17.244 +08:00] [ERROR] [gc_worker.go:970] [“[gc worker] resolve locks failed”] [uuid=5f7b89d47b40014] [safePoint=430160450449833984] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 14:58:17.244 +08:00] [ERROR] [gc_worker.go:546] [“[gc worker] resolve locks returns an error”] [uuid=5f7b89d47b40014] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 14:58:17.244 +08:00] [ERROR] [gc_worker.go:192] [“[gc worker] runGCJob”] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 15:04:54.165 +08:00] [INFO] [gc_worker.go:266] [“[gc worker] starts the whole job”] [uuid=5f7b89d47b40014] [safePoint=430160607736233984] [concurrency=4]
[2021/12/31 15:04:54.166 +08:00] [INFO] [gc_worker.go:956] [“[gc worker] start resolve locks”] [uuid=5f7b89d47b40014] [safePoint=430160607736233984] [concurrency=4]
[2021/12/31 15:05:54.147 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 15:06:54.145 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 15:07:54.145 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 15:08:17.215 +08:00] [ERROR] [gc_worker.go:970] [“[gc worker] resolve locks failed”] [uuid=5f7b89d47b40014] [safePoint=430160607736233984] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 15:08:17.215 +08:00] [ERROR] [gc_worker.go:546] [“[gc worker] resolve locks returns an error”] [uuid=5f7b89d47b40014] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 15:08:17.215 +08:00] [ERROR] [gc_worker.go:192] [“[gc worker] runGCJob”] [error=“[tikv:9005]Region is unavailable”]

1.请问这些region要怎么处理呢 查出来的region没有查到对应表 有的有几百M 这里面不知道是啥数据 这些region是否是没用的
2.为什么会出现没有leader的情况呢 那个没有leader的空region能手动删除吗 是否有操作步骤
【背景】做过哪些操作
【现象】业务和数据库现象
【业务影响】
【TiDB 版本】
4.0.9
【附件】

  1. TiUP Cluster Display 信息

  2. TiUP Cluster Edit Config 信息

  3. TiKV 监控

麻烦登录 pd-ctl,执行config show 看看输出。

您好
» config show
{
“replication”: {
“enable-placement-rules”: “true”,
“location-labels”: “”,
“max-replicas”: 3,
“strictly-match-label”: “false”
},
“schedule”: {
“enable-cross-table-merge”: “false”,
“enable-debug-metrics”: “false”,
“enable-location-replacement”: “true”,
“enable-make-up-replica”: “true”,
“enable-one-way-merge”: “false”,
“enable-remove-down-replica”: “true”,
“enable-remove-extra-replica”: “true”,
“enable-replace-offline-replica”: “true”,
“high-space-ratio”: 0.7,
“hot-region-cache-hits-threshold”: 3,
“hot-region-schedule-limit”: 8,
“leader-schedule-limit”: 8,
“leader-schedule-policy”: “count”,
“low-space-ratio”: 0.8,
“max-merge-region-keys”: 200000,
“max-merge-region-size”: 20,
“max-pending-peer-count”: 16,
“max-snapshot-count”: 32,
“max-store-down-time”: “30m0s”,
“merge-schedule-limit”: 8,
“patrol-region-interval”: “100ms”,
“region-schedule-limit”: 2048,
“replica-schedule-limit”: 64,
“scheduler-max-waiting-operator”: 5,
“split-merge-interval”: “1h0m0s”,
“store-limit-mode”: “manual”,
“tolerant-size-ratio”: 0
}
}

空region可以开启region merge进行合并
https://docs.pingcap.com/zh/tidb/stable/pd-scheduling-best-practices/#region-merge

如果开启后还有空region,可以参考这里的FAQ

您好 已经做了好几次空region合并了 剩下的几个region是合并不了的 有的region不属于任何库表 但是数据很大 合并不了 有5个空region 合并不了 其中一个region没有leader 一直在日志里报那个错误

那几个空region感觉是没用的 不知道有没有删除region的保险的 方法

看告警平台有这个告警


查看了官方文档

tidb日志有这个报错

[2021/12/31 14:48:17.320 +08:00] [ERROR] [gc_worker.go:192] [“[gc worker] runGCJob”] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 14:54:54.159 +08:00] [INFO] [gc_worker.go:266] [“[gc worker] starts the whole job”] [uuid=5f7b89d47b40014] [safePoint=430160450449833984] [concurrency=4]
[2021/12/31 14:54:54.160 +08:00] [INFO] [gc_worker.go:956] [“[gc worker] start resolve locks”] [uuid=5f7b89d47b40014] [safePoint=430160450449833984] [concurrency=4]
[2021/12/31 14:55:54.142 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 14:56:54.145 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 14:57:54.142 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 14:58:17.244 +08:00] [ERROR] [gc_worker.go:970] [“[gc worker] resolve locks failed”] [uuid=5f7b89d47b40014] [safePoint=430160450449833984] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 14:58:17.244 +08:00] [ERROR] [gc_worker.go:546] [“[gc worker] resolve locks returns an error”] [uuid=5f7b89d47b40014] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 14:58:17.244 +08:00] [ERROR] [gc_worker.go:192] [“[gc worker] runGCJob”] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 15:04:54.165 +08:00] [INFO] [gc_worker.go:266] [“[gc worker] starts the whole job”] [uuid=5f7b89d47b40014] [safePoint=430160607736233984] [concurrency=4]
[2021/12/31 15:04:54.166 +08:00] [INFO] [gc_worker.go:956] [“[gc worker] start resolve locks”] [uuid=5f7b89d47b40014] [safePoint=430160607736233984] [concurrency=4]
[2021/12/31 15:05:54.147 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 15:06:54.145 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 15:07:54.145 +08:00] [INFO] [gc_worker.go:237] [“[gc worker] there’s already a gc job running, skipped”] [“leaderTick on”=5f7b89d47b40014]
[2021/12/31 15:08:17.215 +08:00] [ERROR] [gc_worker.go:970] [“[gc worker] resolve locks failed”] [uuid=5f7b89d47b40014] [safePoint=430160607736233984] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 15:08:17.215 +08:00] [ERROR] [gc_worker.go:546] [“[gc worker] resolve locks returns an error”] [uuid=5f7b89d47b40014] [error=“[tikv:9005]Region is unavailable”]
[2021/12/31 15:08:17.215 +08:00] [ERROR] [gc_worker.go:192] [“[gc worker] runGCJob”] [error=“[tikv:9005]Region is unavailable”]

已解决 使用重启tikv大法 重启后那个没有leader 的空region没了 gc开始正常

3 个赞

不过还有几个不能合并的空region 不知道有没有删除的方法?

果然重启能解决99.99%的问题

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。