TiDB集群中一个节点挂掉了,现要重新加入集群有问题

那个,大神有个尴尬的问题:我春节7天没有动这个,现在打开一下,还是这个状态:Pending offline:



日志报错:

1、可以使用 -froce 强制下线,但下线后,需要确认 pd-ctl store 看看还有没有该节点,如果有使用 需要使用 prune 命令,删除 tombstone 状态的节点
2、去下线的服务器上,查看原先的进程端口是否还存在,正常是不存在的,确认不存在后,再扩容

好的,我试试,先谢谢了

ok:ok_hand:

额,我使用–force强制下线后,使用prune清不掉offline节点:
tiup cluster prune tidb-test


  1. 确认一下目前 tiup cluster display 的状态
  2. 确认一下目前集群有没有 down 的 region 。可以通过 pd-ctl 来查看。
    https://docs.pingcap.com/zh/tidb/stable/pd-control#region-check-miss-peer--extra-peer--down-peer--pending-peer--offline-peer--empty-region--hist-size--hist-keys
  3. 再使用 pd-ctl 给一下所有的 store 信息。看看是不是之前缩容的时候的 store 没下掉。

1、tiup cluster display的状态;


2、执行 region check miss-peer指令后有返回值,应该是有down的region
如:
{
“count”: 23,
“regions”: [
{
“id”: 10001,
“start_key”: “7480000000000000FF2F00000000000000F8”,
“end_key”: “7480000000000000FF3200000000000000F8”,
“epoch”: {
“conf_ver”: 5,
“version”: 23
},
“peers”: [
{
“id”: 10002,
“store_id”: 1
},
{
“id”: 10003,
“store_id”: 4
},
{
“id”: 10004,
“store_id”: 5
}
],
“leader”: {
“id”: 10002,
“store_id”: 1
},
“down_peers”: [
{
“peer”: {
“id”: 10003,
“store_id”: 4
},
“down_seconds”: 14040
}
],
“pending_peers”: [
{
“id”: 10003,
“store_id”: 4
}
],
“written_bytes”: 0,
“read_bytes”: 0,
“written_keys”: 0,
“read_keys”: 0,
“approximate_size”: 1,
“approximate_keys”: 0
},
3、所有store的信息:
{
“count”: 3,
“stores”: [
{
“store”: {
“id”: 1,
“address”: “172.20.146.144:20160”,
“state”: 1,
“version”: “4.0.10”,
“status_address”: “172.20.146.144:20180”,
“git_hash”: “2ea4e608509150f8110b16d6e8af39284ca6c93a”,
“start_timestamp”: 1613610928,
“deploy_path”: “/tidata/tidb-deploy/tikv-20160/bin”,
“last_heartbeat”: 1613613814164172934,
“state_name”: “Offline”
},
“status”: {
“capacity”: “10.3GiB”,
“available”: “7.445GiB”,
“used_size”: “31.85MiB”,
“leader_count”: 11,
“leader_weight”: 1,
“leader_score”: 11,
“leader_size”: 11,
“region_count”: 23,
“region_weight”: 1,
“region_score”: 23,
“region_size”: 23,
“start_ts”: “2021-02-18T09:15:28+08:00”,
“last_heartbeat_ts”: “2021-02-18T10:03:34.164172934+08:00”,
“uptime”: “48m6.164172934s”
}
},
{
“store”: {
“id”: 4,
“address”: “172.20.146.145:20160”,
“state”: 1,
“version”: “4.0.10”,
“status_address”: “172.20.146.145:20180”,
“git_hash”: “2ea4e608509150f8110b16d6e8af39284ca6c93a”,
“start_timestamp”: 1612853601,
“deploy_path”: “/tidata/tidb-deploy/tikv-20160/bin”,
“last_heartbeat”: 1612856011555390181,
“state_name”: “Offline”
},
“status”: {
“capacity”: “0B”,
“available”: “0B”,
“used_size”: “0B”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 23,
“region_weight”: 1,
“region_score”: 23,
“region_size”: 23,
“start_ts”: “2021-02-09T14:53:21+08:00”,
“last_heartbeat_ts”: “2021-02-09T15:33:31.555390181+08:00”,
“uptime”: “40m10.555390181s”
}
},
{
“store”: {
“id”: 5,
“address”: “172.20.146.143:20160”,
“version”: “4.0.10”,
“status_address”: “172.20.146.143:20180”,
“git_hash”: “2ea4e608509150f8110b16d6e8af39284ca6c93a”,
“start_timestamp”: 1613610925,
“deploy_path”: “/tidata/tidb-deploy/tikv-20160/bin”,
“last_heartbeat”: 1613632038142316460,
“state_name”: “Up”
},
“status”: {
“capacity”: “10.3GiB”,
“available”: “6.616GiB”,
“used_size”: “32.19MiB”,
“leader_count”: 12,
“leader_weight”: 1,
“leader_score”: 12,
“leader_size”: 12,
“region_count”: 23,
“region_weight”: 1,
“region_score”: 23,
“region_size”: 23,
“start_ts”: “2021-02-18T09:15:25+08:00”,
“last_heartbeat_ts”: “2021-02-18T15:07:18.14231646+08:00”,
“uptime”: “5h51m53.14231646s”
}
}
]
}
我是要把所有为down的region都删掉嘛?额,这个没弄过。。。。。

手动删除tidb-data下面的pd和tikv目录,如何恢复 是同一个集群,参考帖子中方式进行恢复。