求助如何从一个tikv节点恢复集群,在线求助。感谢大家了。

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】
v4.0.11
【问题描述】
近期在学习Tidb,如何从一个TiKV节点恢复集群

当前集群拓扑

# tiup  cluster display daddylab-tidb-cluster
Found cluster newer version:

    The latest version:         v1.4.1
    Local installed version:    v1.3.4
    Update current component:   tiup update cluster
    Update all components:      tiup update --all

Starting component `cluster`: /root/.tiup/components/cluster/v1.3.4/tiup-cluster display daddylab-tidb-cluster
Cluster type:       tidb
Cluster name:       daddylab-tidb-cluster
Cluster version:    v4.0.11
SSH type:           builtin
Dashboard URL:      http://172.16.12.128:2379/dashboard
ID                   Role          Host           Ports        OS/Arch       Status   Data Dir                      Deploy Dir
--                   ----          ----           -----        -------       ------   --------                      ----------
172.16.12.216:9093   alertmanager  172.16.12.216  9093/9094    linux/x86_64  Up       /tidb-data/alertmanager-9093  /tidb-deploy/alertmanager-9093
172.16.12.216:8249   drainer       172.16.12.216  8249         linux/x86_64  Down     /tidb-data/drainer-8249       /tidb-deploy/drainer-8249
172.16.12.216:3000   grafana       172.16.12.216  3000         linux/x86_64  Up       -                             /tidb-deploy/grafana-3000
172.16.12.128:2379   pd            172.16.12.128  2379/2380    linux/x86_64  Up|L|UI  /tidb-data/pd-2379            /tidb-deploy/pd-2379
172.16.12.150:2379   pd            172.16.12.150  2379/2380    linux/x86_64  Up       /tidb-data/pd-2379            /tidb-deploy/pd-2379
172.16.12.217:2379   pd            172.16.12.217  2379/2380    linux/x86_64  Up       /tidb-data/pd-2379            /tidb-deploy/pd-2379
172.16.12.216:9090   prometheus    172.16.12.216  9090         linux/x86_64  Up       /tidb-data/prometheus-9090    /tidb-deploy/prometheus-9090
172.16.12.123:8250   pump          172.16.12.123  8250         linux/x86_64  Up       /tidb-data/pump-8250          /tidb-deploy/pump-8250
172.16.12.142:8250   pump          172.16.12.142  8250         linux/x86_64  Up       /tidb-data/pump-8250          /tidb-deploy/pump-8250
172.16.12.161:8250   pump          172.16.12.161  8250         linux/x86_64  Up       /tidb-data/pump-8250          /tidb-deploy/pump-8250
172.16.12.171:4000   tidb          172.16.12.171  4000/10080   linux/x86_64  Up       -                             /tidb-deploy/tidb-4000
172.16.12.208:4000   tidb          172.16.12.208  4000/10080   linux/x86_64  Up       -                             /tidb-deploy/tidb-4000
172.16.12.213:4000   tidb          172.16.12.213  4000/10080   linux/x86_64  Up       -                             /tidb-deploy/tidb-4000
172.16.12.138:20160  tikv          172.16.12.138  20160/20180  linux/x86_64  Up       /tidb-data/tikv-20160         /tidb-deploy/tikv-20160
Total nodes: 14

连接状态

mysql> show databases;
ERROR 9005 (HY000): Region is unavailable

stroe状态

## ./pd-ctl  -u 172.16.12.217:2379 store
{
  "count": 3,
  "stores": [
    {
      "store": {
        "id": 5,
        "address": "172.16.12.138:20160",
        "version": "4.0.11",
        "status_address": "172.16.12.138:20180",
        "git_hash": "4ac5e7ea1839d63163e911e2e1164d663f49592b",
        "start_timestamp": 1616534161,
        "deploy_path": "/tidb-deploy/tikv-20160/bin",
        "last_heartbeat": 1619082008327760337,
        "state_name": "Up"
      },
      "status": {
        "capacity": "500GiB",
        "available": "429.5GiB",
        "used_size": "46.29GiB",
        "leader_count": 5702,
        "leader_weight": 1,
        "leader_score": 5702,
        "leader_size": 98886,
        "region_count": 11731,
        "region_weight": 1,
        "region_score": 204330,
        "region_size": 204330,
        "start_ts": "2021-03-24T05:16:01+08:00",
        "last_heartbeat_ts": "2021-04-22T17:00:08.327760337+08:00",
        "uptime": "707h44m7.327760337s"
      }
    },
    {
      "store": {
        "id": 1,
        "address": "172.16.12.190:20160",
        "state": 1,
        "version": "4.0.11",
        "status_address": "172.16.12.190:20180",
        "git_hash": "4ac5e7ea1839d63163e911e2e1164d663f49592b",
        "start_timestamp": 1615779803,
        "deploy_path": "/tidb-deploy/tikv-20160/bin",
        "last_heartbeat": 1619080649051625434,
        "state_name": "Offline"
      },
      "status": {
        "capacity": "500GiB",
        "available": "396.6GiB",
        "used_size": "45.8GiB",
        "leader_count": 6029,
        "leader_weight": 1,
        "leader_score": 6029,
        "leader_size": 105444,
        "region_count": 11731,
        "region_weight": 1,
        "region_score": 204330,
        "region_size": 204330,
        "start_ts": "2021-03-15T11:43:23+08:00",
        "last_heartbeat_ts": "2021-04-22T16:37:29.051625434+08:00",
        "uptime": "916h54m6.051625434s"
      }
    },
    {
      "store": {
        "id": 4,
        "address": "172.16.12.176:20160",
        "state": 1,
        "version": "4.0.11",
        "status_address": "172.16.12.176:20180",
        "git_hash": "4ac5e7ea1839d63163e911e2e1164d663f49592b",
        "start_timestamp": 1618884887,
        "deploy_path": "/tidb-deploy/tikv-20160/bin",
        "last_heartbeat": 1618986301191030219,
        "state_name": "Offline"
      },
      "status": {
        "capacity": "500GiB",
        "available": "443.5GiB",
        "used_size": "49.29GiB",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 11731,
        "region_weight": 1,
        "region_score": 204330,
        "region_size": 204330,
        "start_ts": "2021-04-20T10:14:47+08:00",
        "last_heartbeat_ts": "2021-04-21T14:25:01.191030219+08:00",
        "uptime": "28h10m14.191030219s"
      }
    }
  ]
}

2021-04-22 17:11 region分布情况

region.log.xz region.log.xz (30.7 KB)

2021-04-22 17:15 存活节点172.16.12.138:20160 tikv.log

tikv.log.xz tikv.log.xz (5.9 MB)


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

https://asktug.com/t/topic/36199

可以参考这个文档

感谢已经恢复

参考老哥的帖子:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。