pd集群重搭后 经常 pd server out

把下面region health , heartbeat , operator的也贴下

region health没看到有这个





可以在overview面板看下

所以问题看得出来不,我看不出是哪里出了问题

pd-ctl store 结果上传下 看看

{
“count”: 8,
“stores”: [
{
“store”: {
“id”: 104531,
“address”: “10.130.1.6:20161”,
“labels”: [
{
“key”: “host”,
“value”: “10.130.1.6”
}
],
“version”: “5.3.3”,
“status_address”: “10.130.1.6:20181”,
“git_hash”: “ba12036905558f0b490e7607e673e4e8875faa74”,
“start_timestamp”: 1668480009,
“deploy_path”: “/tidb/tidb-deploy/tikv-20161/bin”,
“last_heartbeat”: 1668509196772980239,
“state_name”: “Up”
},
“status”: {
“capacity”: “878.9GiB”,
“available”: “524.1GiB”,
“used_size”: “228GiB”,
“leader_count”: 8430,
“leader_weight”: 1,
“leader_score”: 8430,
“leader_size”: 739004,
“region_count”: 22358,
“region_weight”: 1,
“region_score”: 2865088.1364101213,
“region_size”: 1981348,
“slow_score”: 1,
“start_ts”: “2022-11-15T10:40:09+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:36.772980239+08:00”,
“uptime”: “8h6m27.772980239s”
}
},
{
“store”: {
“id”: 104532,
“address”: “10.130.1.4:20161”,
“labels”: [
{
“key”: “host”,
“value”: “10.130.1.4”
}
],
“version”: “5.3.3”,
“status_address”: “10.130.1.4:20181”,
“git_hash”: “ba12036905558f0b490e7607e673e4e8875faa74”,
“start_timestamp”: 1668509022,
“deploy_path”: “/tidb/tidb-deploy/tikv-20161/bin”,
“last_heartbeat”: 1668509210791894617,
“state_name”: “Up”
},
“status”: {
“capacity”: “878.9GiB”,
“available”: “527.2GiB”,
“used_size”: “227.5GiB”,
“leader_count”: 8419,
“leader_weight”: 1,
“leader_score”: 8419,
“leader_size”: 751554,
“region_count”: 21960,
“region_weight”: 1,
“region_score”: 2821817.0753636295,
“region_size”: 1953127,
“slow_score”: 1,
“start_ts”: “2022-11-15T18:43:42+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:50.791894617+08:00”,
“uptime”: “3m8.791894617s”
}
},
{
“store”: {
“id”: 104533,
“address”: “10.130.1.5:20161”,
“labels”: [
{
“key”: “host”,
“value”: “10.130.1.5”
}
],
“version”: “5.3.3”,
“status_address”: “10.130.1.5:20181”,
“git_hash”: “ba12036905558f0b490e7607e673e4e8875faa74”,
“start_timestamp”: 1668480009,
“deploy_path”: “/tidb/tidb-deploy/tikv-20161/bin”,
“last_heartbeat”: 1668509197876446650,
“state_name”: “Up”
},
“status”: {
“capacity”: “878.9GiB”,
“available”: “495.3GiB”,
“used_size”: “227.8GiB”,
“leader_count”: 8428,
“leader_weight”: 1,
“leader_score”: 8428,
“leader_size”: 739540,
“region_count”: 22159,
“region_weight”: 1,
“region_score”: 2869008.403657904,
“region_size”: 1961750,
“slow_score”: 1,
“start_ts”: “2022-11-15T10:40:09+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:37.87644665+08:00”,
“uptime”: “8h6m28.87644665s”
}
},
{
“store”: {
“id”: 4442329,
“address”: “10.130.1.5:20162”,
“labels”: [
{
“key”: “host”,
“value”: “10.130.1.5”
}
],
“version”: “5.3.3”,
“status_address”: “10.130.1.5:20182”,
“git_hash”: “ba12036905558f0b490e7607e673e4e8875faa74”,
“start_timestamp”: 1668480010,
“deploy_path”: “/data/node1/tidb-deploy/tikv-20162/bin”,
“last_heartbeat”: 1668509194915533338,
“state_name”: “Up”
},
“status”: {
“capacity”: “1.718TiB”,
“available”: “1.315TiB”,
“used_size”: “268GiB”,
“leader_count”: 8428,
“leader_weight”: 1,
“leader_score”: 8428,
“leader_size”: 722046,
“region_count”: 28415,
“region_weight”: 1,
“region_score”: 2890567.528681414,
“region_size”: 2427307,
“slow_score”: 1,
“start_ts”: “2022-11-15T10:40:10+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:34.915533338+08:00”,
“uptime”: “8h6m24.915533338s”
}
},
{
“store”: {
“id”: 4442330,
“address”: “10.130.1.6:20162”,
“labels”: [
{
“key”: “host”,
“value”: “10.130.1.6”
}
],
“version”: “5.3.3”,
“status_address”: “10.130.1.6:20182”,
“git_hash”: “ba12036905558f0b490e7607e673e4e8875faa74”,
“start_timestamp”: 1668480009,
“deploy_path”: “/data/node1/tidb-deploy/tikv-20162/bin”,
“last_heartbeat”: 1668509193130497501,
“state_name”: “Up”
},
“status”: {
“capacity”: “1.718TiB”,
“available”: “1.318TiB”,
“used_size”: “266GiB”,
“leader_count”: 8429,
“leader_weight”: 1,
“leader_score”: 8429,
“leader_size”: 722015,
“region_count”: 28218,
“region_weight”: 1,
“region_score”: 2866576.1057386314,
“region_size”: 2407716,
“slow_score”: 1,
“start_ts”: “2022-11-15T10:40:09+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:33.130497501+08:00”,
“uptime”: “8h6m24.130497501s”
}
},
{
“store”: {
“id”: 4442335,
“address”: “10.130.1.4:20162”,
“labels”: [
{
“key”: “host”,
“value”: “10.130.1.4”
}
],
“version”: “5.3.3”,
“status_address”: “10.130.1.4:20182”,
“git_hash”: “ba12036905558f0b490e7607e673e4e8875faa74”,
“start_timestamp”: 1668480010,
“deploy_path”: “/data/node1/tidb-deploy/tikv-20162/bin”,
“last_heartbeat”: 1668509195777341804,
“state_name”: “Up”
},
“status”: {
“capacity”: “1.718TiB”,
“available”: “1.316TiB”,
“used_size”: “268.6GiB”,
“leader_count”: 8439,
“leader_weight”: 1,
“leader_score”: 8439,
“leader_size”: 714897,
“region_count”: 28617,
“region_weight”: 1,
“region_score”: 2900805.4917382784,
“region_size”: 2436118,
“slow_score”: 1,
“start_ts”: “2022-11-15T10:40:10+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:35.777341804+08:00”,
“uptime”: “8h6m25.777341804s”
}
},
{
“store”: {
“id”: 87144,
“address”: “10.130.1.7:3930”,
“labels”: [
{
“key”: “engine”,
“value”: “tiflash”
}
],
“version”: “v5.3.3”,
“peer_address”: “10.130.1.7:20170”,
“status_address”: “10.130.1.7:20292”,
“git_hash”: “a8597944da3965cc3252b229d3766343b6234674”,
“start_timestamp”: 1668480080,
“deploy_path”: “/tidb/tidb-deploy/tiflash-9000/bin/tiflash”,
“last_heartbeat”: 1668509204238479799,
“state_name”: “Up”
},
“status”: {
“capacity”: “878.9GiB”,
“available”: “821.4GiB”,
“used_size”: “36.62MiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 0,
“region_weight”: 1,
“region_score”: 0,
“region_size”: 0,
“slow_score”: 0,
“start_ts”: “2022-11-15T10:41:20+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:44.238479799+08:00”,
“uptime”: “8h5m24.238479799s”
}
},
{
“store”: {
“id”: 98570,
“address”: “10.130.1.8:3930”,
“labels”: [
{
“key”: “engine”,
“value”: “tiflash”
}
],
“version”: “v5.3.3”,
“peer_address”: “10.130.1.8:20170”,
“status_address”: “10.130.1.8:20292”,
“git_hash”: “a8597944da3965cc3252b229d3766343b6234674”,
“start_timestamp”: 1668480080,
“deploy_path”: “/tidb/tidb-deploy/tiflash-9000/bin/tiflash”,
“last_heartbeat”: 1668509203870179080,
“state_name”: “Up”
},
“status”: {
“capacity”: “878.9GiB”,
“available”: “820.9GiB”,
“used_size”: “36.61MiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 0,
“region_weight”: 1,
“region_score”: 0,
“region_size”: 0,
“slow_score”: 0,
“start_ts”: “2022-11-15T10:41:20+08:00”,
“last_heartbeat_ts”: “2022-11-15T18:46:43.87017908+08:00”,
“uptime”: “8h5m23.87017908s”
}
}
]
}

有22 down REGION,一直在 可不可以删掉

这个刚刚重启过。看看相应tikv日志

我处理的 是正常的

down region删除 试试 pd-ctl opreator add remove-peer 从down peer所在的store删除 看看能否执行,具体语法help看下

删了 情况还是没有好转,不知道哪里出现问题

感觉像是bug

有办法处理吗 你也可以远程我这边检查下

下载一个clinic, 收集一段时间的信息的文件上传下

好像只有6.0+才有 我才5.3.3

离线的环境能装吗

下载离线包, tiup mirror set临时指向解压的安装包,

版本有没有问题 一个大版本

我看好像还要登录啥的