SQL查询为什么还访问下线的 tiflash 节点

tidb 5.0.3
之前6台tiflash,下线了2台,每天都有SQL去请求下线的tiflash节点。每天大概160次请求报错。
execute bg-bigdata fail. Cause by: get store failed: 2: invalid store ID 33139110, not found
com.netflix.hystrix.exception.HystrixBadRequestException: get store failed: 2: invalid store ID 33139110, not found

我看系统表里面没有下线的store_id,pd-ctl 里面查看也没有了.

PD里面的信息:
Starting component ctl: /root/.tiup/components/ctl/v5.0.3/ctl pd -u http://xxx.xxx.xxx.93:2379 -i
» store
{
“count”: 8,
“stores”: [
{
“store”: {
“id”: 11585778,
“address”: “xxx.xxx.xxx.193:20160”,
“version”: “5.0.3”,
“status_address”: “xxx.xxx.xxx.193:20180”,
“git_hash”: “63b63edfbb9bbf8aeb875aad28c59f082eeb55d4”,
“start_timestamp”: 1645158861,
“deploy_path”: “/data1/tidb-rpt/tikv/bin”,
“last_heartbeat”: 1654591776605213867,
“state_name”: “Up”
},
“status”: {
“capacity”: “2.861TiB”,
“available”: “2.175TiB”,
“used_size”: “225.7GiB”,
“leader_count”: 6863,
“leader_weight”: 1,
“leader_score”: 6863,
“leader_size”: 500993,
“region_count”: 20201,
“region_weight”: 1,
“region_score”: 1687100.1989707288,
“region_size”: 1512975,
“start_ts”: “2022-02-18T12:34:21+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:36.605213867+08:00”,
“uptime”: “2620h15m15.605213867s”
}
},
{
“store”: {
“id”: 290158285,
“address”: “xxx.xxx.xxx.128:3930”,
“labels”: [
{
“key”: “engine”,
“value”: “tiflash”
}
],
“version”: “v5.0.3”,
“peer_address”: “xxx.xxx.xxx.128:20170”,
“status_address”: “xxx.xxx.xxx.128:20292”,
“git_hash”: “0194cb4b59438d8d46fc05a4b1abd85eeb69972f”,
“start_timestamp”: 1645193909,
“deploy_path”: “/data0/tidb-rpt/tiflash/bin/tiflash”,
“last_heartbeat”: 1654591769169989147,
“state_name”: “Up”
},
“status”: {
“capacity”: “5.722TiB”,
“available”: “5.556TiB”,
“used_size”: “169.9GiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 9347,
“region_weight”: 1,
“region_score”: 694902.3730481513,
“region_size”: 662056,
“start_ts”: “2022-02-18T22:18:29+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:29.169989147+08:00”,
“uptime”: “2610h31m0.169989147s”
}
},
{
“store”: {
“id”: 290158286,
“address”: “xxx.xxx.xxx.44:3930”,
“labels”: [
{
“key”: “engine”,
“value”: “tiflash”
}
],
“version”: “v5.0.3”,
“peer_address”: “xxx.xxx.xxx.44:20170”,
“status_address”: “xxx.xxx.xxx.44:20292”,
“git_hash”: “0194cb4b59438d8d46fc05a4b1abd85eeb69972f”,
“start_timestamp”: 1645193909,
“deploy_path”: “/data0/tidb-rpt/tiflash/bin/tiflash”,
“last_heartbeat”: 1654591775503098763,
“state_name”: “Up”
},
“status”: {
“capacity”: “5.722TiB”,
“available”: “5.544TiB”,
“used_size”: “181.9GiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 8545,
“region_weight”: 1,
“region_score”: 695177.6023334209,
“region_size”: 662278,
“start_ts”: “2022-02-18T22:18:29+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:35.503098763+08:00”,
“uptime”: “2610h31m6.503098763s”
}
},
{
“store”: {
“id”: 1,
“address”: “xxx.xxx.xxx.227:20160”,
“version”: “5.0.3”,
“status_address”: “xxx.xxx.xxx.227:20180”,
“git_hash”: “63b63edfbb9bbf8aeb875aad28c59f082eeb55d4”,
“start_timestamp”: 1645158536,
“deploy_path”: “/data1/tidb-rpt/tikv/bin”,
“last_heartbeat”: 1654591769207904668,
“state_name”: “Up”
},
“status”: {
“capacity”: “2.861TiB”,
“available”: “2.105TiB”,
“used_size”: “208GiB”,
“leader_count”: 6877,
“leader_weight”: 1,
“leader_score”: 6877,
“leader_size”: 516838,
“region_count”: 20616,
“region_weight”: 1,
“region_score”: 1686009.8195962925,
“region_size”: 1508810,
“start_ts”: “2022-02-18T12:28:56+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:29.207904668+08:00”,
“uptime”: “2620h20m33.207904668s”
}
},
{
“store”: {
“id”: 5,
“address”: “xxx.xxx.xxx.186:20160”,
“version”: “5.0.3”,
“status_address”: “xxx.xxx.xxx.186:20180”,
“git_hash”: “63b63edfbb9bbf8aeb875aad28c59f082eeb55d4”,
“start_timestamp”: 1645158190,
“deploy_path”: “/data1/tidb-rpt/tikv/bin”,
“last_heartbeat”: 1654591777635239952,
“state_name”: “Up”
},
“status”: {
“capacity”: “2.861TiB”,
“available”: “2.083TiB”,
“used_size”: “216.4GiB”,
“leader_count”: 6875,
“leader_weight”: 1,
“leader_score”: 6875,
“leader_size”: 492692,
“region_count”: 20873,
“region_weight”: 1,
“region_score”: 1686544.965102952,
“region_size”: 1508275,
“start_ts”: “2022-02-18T12:23:10+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:37.635239952+08:00”,
“uptime”: “2620h26m27.635239952s”
}
},
{
“store”: {
“id”: 2811498,
“address”: “xxx.xxx.xxx.215:3930”,
“labels”: [
{
“key”: “engine”,
“value”: “tiflash”
}
],
“version”: “v5.0.3”,
“peer_address”: “xxx.xxx.xxx.215:20170”,
“status_address”: “xxx.xxx.xxx.215:20292”,
“git_hash”: “0194cb4b59438d8d46fc05a4b1abd85eeb69972f”,
“start_timestamp”: 1645171582,
“deploy_path”: “/data0/tidb-rpt/tiflash/bin/tiflash”,
“last_heartbeat”: 1654591775924216205,
“state_name”: “Up”
},
“status”: {
“capacity”: “5.722TiB”,
“available”: “5.549TiB”,
“used_size”: “177.4GiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 8854,
“region_weight”: 1,
“region_score”: 696819.0356598455,
“region_size”: 663857,
“start_ts”: “2022-02-18T16:06:22+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:35.924216205+08:00”,
“uptime”: “2616h43m13.924216205s”
}
},
{
“store”: {
“id”: 60800510,
“address”: “xxx.xxx.xxx.183:3930”,
“labels”: [
{
“key”: “engine”,
“value”: “tiflash”
}
],
“version”: “v5.0.3”,
“peer_address”: “xxx.xxx.xxx.183:20170”,
“status_address”: “xxx.xxx.xxx.183:20292”,
“git_hash”: “0194cb4b59438d8d46fc05a4b1abd85eeb69972f”,
“start_timestamp”: 1645157500,
“deploy_path”: “/data0/tidb-rpt/tiflash/bin/tiflash”,
“last_heartbeat”: 1654591772520421030,
“state_name”: “Up”
},
“status”: {
“capacity”: “5.722TiB”,
“available”: “5.562TiB”,
“used_size”: “163.4GiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 9602,
“region_weight”: 1,
“region_score”: 695118.5272697657,
“region_size”: 662283,
“start_ts”: “2022-02-18T12:11:40+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:32.52042103+08:00”,
“uptime”: “2620h37m52.52042103s”
}
},
{
“store”: {
“id”: 4,
“address”: “xxx.xxx.xxx.137:20160”,
“version”: “5.0.3”,
“status_address”: “xxx.xxx.xxx.137:20180”,
“git_hash”: “63b63edfbb9bbf8aeb875aad28c59f082eeb55d4”,
“start_timestamp”: 1645157854,
“deploy_path”: “/data1/tidb-rpt/tikv/bin”,
“last_heartbeat”: 1654591776122431535,
“state_name”: “Up”
},
“status”: {
“capacity”: “2.861TiB”,
“available”: “2.079TiB”,
“used_size”: “215.6GiB”,
“leader_count”: 6869,
“leader_weight”: 1,
“leader_score”: 6869,
“leader_size”: 501669,
“region_count”: 20762,
“region_weight”: 1,
“region_score”: 1684791.9694096753,
“region_size”: 1506516,
“start_ts”: “2022-02-18T12:17:34+08:00”,
“last_heartbeat_ts”: “2022-06-07T16:49:36.122431535+08:00”,
“uptime”: “2620h32m2.122431535s”
}
}
]
}

能看下集群的状态信息么?tiup 的哪种

问下这个region位置应该是pd的收集的元信息吧

应该是 TiDB 的 region cache 没有正确清理,印象中这个版本确实有这个问题。可以尝试重启 TiDB 节点试试。

好的,我找个时间重启一下 tidb节点

另外如果希望防止后续出现类似问题,可以考虑升级到 5.4.x 最新版本,新版本在稳定性和性能方面相对 5.0.3 有很大提升。

1 Like

学习到了

你好,再问一下哈, 比如我知道某张表查询会报这个错,我用analyze 获取一下这个表的统计信息会 解决这个问题吗?

抱歉并不能哈。这个属于产品 bug,重启或者升级才能解决:upside_down_face:

我让研发把查询报错的表给我,目测就集中在两个表上。 我昨天把两张表重建了,这样应该是好了,我看到现在没报错了。:grinning:

1 Like

:+1:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。