tidb的版本能直接 从 4.0.4 升级到 6.1 ,之后,会报 Region is unavailable ,会是哪的问题啊
[2023/03/08 13:55:49.765 +08:00] [WARN] [endpoint.rs:621] [error-response] [err=“Region error (will back off and retry) message: "peer is not leader for region 1600789, leader may Some(id: 1600791 store_id: 411332)" not_leader { region_id: 1600789 leader { id: 1600791 store_id: 411332 } }”]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1944033, leader may Some(id: 1944035 store_id: 411332)" not_leader { region_id: 1944033 leader { id: 1944035 store_id: 411332 } }))”] [cid=4120783]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1944033, leader may Some(id: 1944035 store_id: 411332)" not_leader { region_id: 1944033 leader { id: 1944035 store_id: 411332 } }))”] [cid=4120782]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120784]
[2023/03/08 13:55:49.765 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120785]
[2023/03/08 13:55:49.876 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120786]
[2023/03/08 13:55:49.876 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1924577, leader may Some(id: 1924579 store_id: 411332)" not_leader { region_id: 1924577 leader { id: 1924579 store_id: 411332 } }))”] [cid=4120787]
[2023/03/08 13:55:49.936 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1944033, leader may Some(id: 1944035 store_id: 411332)" not_leader { region_id: 1944033 leader { id: 1944035 store_id: 411332 } }))”] [cid=4120789]
[2023/03/08 13:55:49.985 +08:00] [INFO] [scheduler.rs:596] [“get snapshot failed”] [err=“Error(Request(message: "peer is not leader for region 1903877, leader may Some(id: 1903879 store_id: 411332)" not_leader { region_id: 1903877 leader { id: 1903879 store_id: 411332 } }))”] [cid=4120793]
not leader,这个报错,会自动修复吗
之前是不是提过类似帖子。从tidb的角度 not leader报错是正常的,因为再通过region cache里的信息去tikv读取数据时region leader已经迁移到其他tikv节点了,然后就会返回这个错误,tidb会进行重试到正确的tikv。 region unavailable可以先按下面排查下
1、 tiup cluster display 检查是否有异常tikv
2、pd-ctl config show 检查副本数设置max-replicas是否>=3
3、检查region
(1) 没有leader的region
pd-ctl region --jq=‘.regions[]|select(has(“leader”)|not)|{id: .id,peer_stores: [.peers[].store_id]}’
(2) region数小于一定数量的region
pd-ctl region --jq='.regions[] | {id: .id, peer_stores: [.peers[].store_id] | select(length==1) } ’
4、 检查表或索引的region
show table xx regions 获取报错的region id
pd-ctl region xxx ,检查region状态
5、官方文档中的一些排查场景
https://docs.pingcap.com/zh/tidb/stable/tidb-troubleshooting-map#11-客户端报-region-is-unavailable-错误
感谢啊
现在有些数据查不出来
什么数据查不出来
程序报这个错,数据查不出来
看看监控集群,是不是一直在backoff 或者 leader drop
是的话得先解决了
[2023/03/08 15:47:27.453 +08:00] [WARN] [raft_client.rs:296] [“RPC batch_raft fail”] [err=“Some(RpcFailure(RpcStatus { status: 14-UNAVAILABLE, details: Some("failed to connect to all addresses") }))”] [sink_err=“Some(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some("failed to connect to all addresses") })))”] [to_addr=192.168.3.64:20170]
不止这个有问题吧
看看网络,服务器负载情况
先整体的看看监控吧 到底啥状况
[ERROR] [transport.rs:163] [“send raft msg err”] [err=“Other("[src/server/raft_client.rs:208]: RaftClient send fail")”]
网络,负载都是正常的
tiup cluster display 贴下
这个问题,会自动恢复吗
这个与从 v4 直接升到 v6有关系吗
昨天还有一个集群,是从v5升级到v6,没有发生