Tikv脱机时间久了怎么处理

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】: 5.7.10-TiDB-v2.1.4
  • 【问题描述】: 宕机前该节点正常,集群中状态显示为down,没有及时拉起,现在该tikv宕机近两个月怎么恢复,正常节点数据量是异常节点的3倍左右

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

请提供下 pd-clt store 各个 tikv 节点的信息

{ “count”: 3, “stores”: [ { “store”: { “id”: 4, “address”: “10.17.16.221:20160”, “version”: “2.1.3”, “state_name”: “Up” }, “status”: { “capacity”: “640 GiB”, “available”: “145 GiB”, “leader_count”: 91164, “leader_weight”: 1, “leader_score”: 1824296, “leader_size”: 1824296, “region_count”: 183689, “region_weight”: 1, “region_score”: 928670821.6549089, “region_size”: 3648456, “start_ts”: “2019-05-19T03:34:59+08:00”, “last_heartbeat_ts”: “2019-12-06T15:59:24.960634328+08:00”, “uptime”: “4836h24m25.960634328s” } }, { “store”: { “id”: 5, “address”: “10.17.16.223:20160”, “version”: “2.1.3”, “state_name”: “Down” }, “status”: { “capacity”: “443 GiB”, “available”: “291 GiB”, “leader_weight”: 1, “region_count”: 15695, “region_weight”: 1, “region_score”: 295909, “region_size”: 295909, “start_ts”: “2019-05-19T03:34:59+08:00”, “last_heartbeat_ts”: “2019-08-03T18:08:41.722274372+08:00”, “uptime”: “1838h33m42.722274372s” } }, { “store”: { “id”: 1, “address”: “10.17.16.222:20160”, “version”: “2.1.3”, “state_name”: “Up” }, “status”: { “capacity”: “640 GiB”, “available”: “155 GiB”, “leader_count”: 92525, “leader_weight”: 1, “leader_score”: 1824160, “leader_size”: 1824160, “region_count”: 183689, “region_weight”: 1, “region_score”: 843462227.7800903, “region_size”: 3648456, “start_ts”: “2019-05-19T03:34:58+08:00”, “last_heartbeat_ts”: “2019-12-06T15:59:26.662490145+08:00”, “uptime”: “4836h24m28.662490145s” } } ] }

如果是想把 down 状态的节点重新加入到 tikv 集群中,可以将该节点拉起,但是此时会产生大量的 snapshot,leader 和 region 调度。建议在业务低峰期操作,具体调度的速度取决于数据量以及调度策略。调查策略参考参数如下:

pd 调度相关文档如下:

好的,谢谢

:+1::+1::+1: