节点下线region迁移非常慢

tidb 4.0.8
目前我们下线一个节点,发现region的迁移太慢,节点状态一直处于offline状态。

https://docs.pingcap.com/zh/tidb/stable/pd-scheduling-best-practices#节点下线速度慢
参考这个手册做了设置,还是一样的慢。
已经设置如下参数:
“region-schedule-limit”: 4096,
“replica-schedule-limit”: 1024,

“max-pending-peer-count”: 256,
“max-snapshot-count”: 256
帮忙看看。

1 个赞

1.集群下线是做了什么操作,正常下线还是宕机下线
2.pd-ctl 也看下 replica-schedule-limit,以及 hot region 等调度配置
3.请提供下监控信息:[FAQ] Grafana Metrics 页面的导出和导入

1 个赞

正常下线。
“schedule”: {
“enable-cross-table-merge”: “false”,
“enable-debug-metrics”: “false”,
“enable-location-replacement”: “true”,
“enable-make-up-replica”: “true”,
“enable-one-way-merge”: “false”,
“enable-remove-down-replica”: “true”,
“enable-remove-extra-replica”: “true”,
“enable-replace-offline-replica”: “true”,
“high-space-ratio”: 0.7,
“hot-region-cache-hits-threshold”: 32,
“hot-region-schedule-limit”: 64,
“leader-schedule-limit”: 128,
“leader-schedule-policy”: “count”,
“low-space-ratio”: 0.8,
“max-merge-region-keys”: 200000,
“max-merge-region-size”: 20,
“max-pending-peer-count”: 256,
“max-snapshot-count”: 256,
“max-store-down-time”: “30m0s”,
“merge-schedule-limit”: 64,
“patrol-region-interval”: “100ms”,
“region-schedule-limit”: 4096,
“replica-schedule-limit”: 1024,
“scheduler-max-waiting-operator”: 10,
“split-merge-interval”: “1h0m0s”,
“store-limit-mode”: “manual”,
“tolerant-size-ratio”: 0
}

1 个赞

监控太多了,需要看哪个页面的?

1 个赞

参考下这个帖子,很多排查步骤

1 个赞

leader多不多?尝试先驱散leader

1 个赞

嗯,先把leader都迁移走了。
region的迁移很慢很慢

1 个赞

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。