掉线2个月的tikv节点如何加入集群?

【 TiKV 使用环境】生产环境
【 TiKV 版本】v8.1.1
【复现路径】有一个tikv节点未知原因掉线,过去两个月后发现与其他store之间的数据量差距较大
【资源配置】
3台48核128G物理机,混合部署3个tikv+3个pd(掉线的一台仅仅是tikv断掉,pd正常)
现在看正常的两个tikv节点数据量比较平均,都是183G左右。掉线的那台为393G。
【Store信息】

{
  "count": 3,
  "stores": [
    {
      "store": {
        "id": 1005,
        "address": "0.0.0.101:20160",
        "version": "8.1.1",
        "peer_address": "0.0.0.101:20160",
        "status_address": "0.0.0.101:20180",
        "git_hash": "7793f1d5dc40206fe406ca001be1e0d7f1b83a8f",
        "start_timestamp": 1751173377,
        "deploy_path": "/",
        "last_heartbeat": 1766821769530714325,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.345TiB",
        "available": "1.108TiB",
        "used_size": "139.5GiB",
        "leader_count": 3481,
        "leader_weight": 1,
        "leader_score": 3481,
        "leader_size": 7099016,
        "region_count": 8244,
        "region_weight": 1,
        "region_score": 21506574.567600116,
        "region_size": 17440914,
        "slow_score": 1,
        "slow_trend": {
          "cause_value": 250013.37751677854,
          "cause_rate": 0,
          "result_value": 35590.5,
          "result_rate": -55456.687326754385
        },
        "start_ts": "2025-06-29T13:02:57+08:00",
        "last_heartbeat_ts": "2025-12-27T15:49:29.530714325+08:00",
        "uptime": "4346h46m32.530714325s"
      }
    },
    {
      "store": {
        "id": 1004,
        "address": "0.0.0.102:20160",
        "version": "8.1.1",
        "peer_address": "0.0.0.102:20160",
        "status_address": "0.0.0.102:20180",
        "git_hash": "7793f1d5dc40206fe406ca001be1e0d7f1b83a8f",
        "start_timestamp": 1751173380,
        "deploy_path": "/",
        "last_heartbeat": 1760512413898838807,
        "state_name": "Down"
      },
      "status": {
        "capacity": "392.6GiB",
        "available": "0B",
        "used_size": "90.09GiB",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 8244,
        "region_weight": 1,
        "region_score": 10085360052.449165,
        "region_size": 17440914,
        "slow_score": 75,
        "slow_trend": {
          "cause_value": 250011.18456375838,
          "cause_rate": 0,
          "result_value": 3.5,
          "result_rate": 0
        },
        "start_ts": "2025-06-29T13:03:00+08:00",
        "last_heartbeat_ts": "2025-10-15T15:13:33.898838807+08:00",
        "uptime": "2594h10m33.898838807s"
      }
    },
    {
      "store": {
        "id": 1001,
        "address": "0.0.0.103:20160",
        "version": "8.1.1",
        "peer_address": "0.0.0.103:20160",
        "status_address": "0.0.0.103:20180",
        "git_hash": "7793f1d5dc40206fe406ca001be1e0d7f1b83a8f",
        "start_timestamp": 1751173381,
        "deploy_path": "/",
        "last_heartbeat": 1766821770487281780,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.345TiB",
        "available": "1.109TiB",
        "used_size": "138.7GiB",
        "leader_count": 4763,
        "leader_weight": 1,
        "leader_score": 4763,
        "leader_size": 10341898,
        "region_count": 8244,
        "region_weight": 1,
        "region_score": 21504760.077404324,
        "region_size": 17440914,
        "slow_score": 1,
        "slow_trend": {
          "cause_value": 250018.2701342282,
          "cause_rate": 0,
          "result_value": 43229,
          "result_rate": -99249.85309282198
        },
        "start_ts": "2025-06-29T13:03:01+08:00",
        "last_heartbeat_ts": "2025-12-27T15:49:30.48728178+08:00",
        "uptime": "4346h46m29.48728178s"
      }
    }
  ]
}

【region信息】

"regions": [
        {
            "id": 336442013,
            "start_key": "6A66732D70726F64FFFD4112470B010000FF0000430000000000FE",
            "end_key": "6A66732D70726F64FFFD41125B39050000FF0000430000000700FE",
            "epoch": {
                "conf_ver": 5,
                "version": 102
            },
            "peers": [
                {
                    "role_name": "Voter",
                    "id": 336442014,
                    "store_id": 1001
                },
                {
                    "role_name": "Voter",
                    "id": 336442015,
                    "store_id": 1004
                },
                {
                    "role_name": "Voter",
                    "id": 336442016,
                    "store_id": 1005
                }
            ],
            "leader": {
                "role_name": "Voter",
                "id": 336442014,
                "store_id": 1001
            },
            "down_peers": [
                {
                    "peer": {
                        "role_name": "Voter",
                        "id": 336442015,
                        "store_id": 1004
                    },
                    "down_seconds": 6296086
                }
            ],
            "pending_peers": [
                {
                    "role_name": "Voter",
                    "id": 336442015,
                    "store_id": 1004
                }
            ],
            "cpu_usage": 0,
            "written_bytes": 3398,
            "read_bytes": 88877,
            "written_keys": 60,
            "read_keys": 7,
            "approximate_size": 94,
            "approximate_keys": 609756
        },

有没有比较推荐的恢复手段?

感谢解答,可以详细说一下步骤吗?

我有看到这份博客,我的情况属于:计划外停机,满足raft多数派 吧?是否可以直接拉起tikv
博客 - TiKV存储节点计划内外停机,如何去处理? | TiDB 社区

直接拉起来就行

配置都是一样的?

感谢解答,生产环境这样操作会不会有风险?是否可以pdctl delete 掉那个down掉的store,然后备份393G数据后,再启动新的tikv加入集群?

我看region已经都balance过了,delete store是不是没什么负担了?只需要考虑新节点加入后的balance?

对,一样的

为啥掉线2个月才发现,这有点慌啊

down的store已经没有leader了,可以直接用tiup缩容掉,重新扩一下这个节点。

可能掉线后,pd不能及时清理那些垃圾数据. 是否考虑备份数据然后清空原来的tikv节点, 然后加一个新的tikv节点?

感觉是配置哪有问题导致的