select报错9005 - Region is unavailable

  1. 可以麻烦您再show一下结果吗? 如果不一样,应该就是因为 split 导致了 pd 的路由信息暂时有空洞。
  2. 查询sql是一直报错吗?

1.show了几次结果都一样的。
tableregion2.xls (53 KB)

2.sql查询只要是那个表的那个范围的id的数据,都一直报错。

MySQL [jinfan]> select count(1) from  stepdata where id<3887094183457980434;
+----------+
| count(1) |
+----------+
| 31625031 |
+----------+
1 row in set (2.22 sec)

MySQL [jinfan]> select count(1) from  stepdata where id>=3890916794226966566;
+----------+
| count(1) |
+----------+
|  2214354 |
+----------+
1 row in set (0.61 sec)

MySQL [jinfan]> select count(1) from  stepdata where id>=3887094183457980434 and id<3890916794226966566;
ERROR 9005 (HY000): Region is unavailable
MySQL [jinfan]> select * from  stepdata where id=3887678264692441088
    -> ;
ERROR 9005 (HY000): Region is unavailable
MySQL [jinfan]> delete from  stepdata where id=3887678264692441088;
ERROR 9005 (HY000): Region is unavailable
  1. 请问问题发生前有过什么异常吗? 比如断电,或者执行过 unsafe-recover 之类的
  2. 麻烦您提供下 store 1, 7001, 7002 的日志,如果大的话可以 grep 4800001

1.前面有一段时间经常连不上,一般用 PD Recover 恢复,没执行过unsafe-recover。
2.日志是通过 kubectl logs 查看的日志吗,因为之前删过pod,所以只能看到近期产生的日志,由于tikv2的日志 grep 4800001 后还是有很多,我就只截取了其中一部分。

[root@k8s-master01 bin]# kubectl logs -njinfan tidb-cluster-1605234515-tikv-0 |grep 4800001
[2021/04/13 14:10:41.240 +00:00] [INFO] [peer.rs:159] ["create peer"] [peer_id=4800002] [region_id=4800001]
[2021/04/13 14:10:41.240 +00:00] [INFO] [raft.rs:783] ["became follower at term 35"] [term=35] [raft_id=4800002] [region_id=4800001]
[2021/04/13 14:10:41.240 +00:00] [INFO] [raft.rs:285] [newRaft] [peers="[(4800004, Progress { matched: 0, next_idx: 36, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } }), (4800002, Progress { matched: 35, next_idx: 36, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } }), (4800003, Progress { matched: 0, next_idx: 36, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } })]"] ["last term"=35] ["last index"=35] [applied=35] [commit=35] [term=35] [raft_id=4800002] [region_id=4800001]
[2021/04/13 14:10:41.240 +00:00] [INFO] [raw_node.rs:222] ["RawNode created with id 4800002."] [id=4800002] [raft_id=4800002] [region_id=4800001]
[2021/04/13 14:10:55.357 +00:00] [INFO] [raft.rs:1192] ["[logterm: 35, index: 35, vote: 4800003] cast vote for 4800004 [logterm: 35, index: 35] at term 35"] ["msg type"=MsgRequestPreVote] [term=35] [msg_index=35] [msg_term=35] [from=4800004] [vote=4800003] [log_index=35] [log_term=35] [raft_id=4800002] [region_id=4800001]
[2021/04/13 14:10:55.367 +00:00] [INFO] [raft.rs:1003] ["received a message with higher term from 4800004"] ["msg type"=MsgRequestVote] [message_term=36] [term=35] [from=4800004] [raft_id=4800002] [region_id=4800001]
[2021/04/13 14:10:55.367 +00:00] [INFO] [raft.rs:783] ["became follower at term 36"] [term=36] [raft_id=4800002] [region_id=4800001]
[2021/04/13 14:10:55.367 +00:00] [INFO] [raft.rs:1192] ["[logterm: 35, index: 35, vote: 0] cast vote for 4800004 [logterm: 35, index: 35] at term 36"] ["msg type"=MsgRequestVote] [term=36] [msg_index=35] [msg_term=35] [from=4800004] [vote=0] [log_index=35] [log_term=35] [raft_id=4800002] [region_id=4800001]
[2021/04/13 14:21:02.511 +00:00] [INFO] [process.rs:145] ["get snapshot failed"] [err="Request(message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } })"] [cid=1457]
[2021/04/14 00:59:49.023 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/14 01:26:16.727 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/14 02:13:03.719 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/14 07:18:04.140 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/14 07:29:57.617 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/15 00:35:38.058 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/15 00:36:09.996 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/15 05:33:44.272 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/15 05:34:03.934 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/15 11:53:10.161 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/15 14:09:12.984 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/16 05:10:22.581 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/16 05:49:12.159 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/16 05:51:56.404 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/16 05:52:27.220 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/16 07:10:18.924 +00:00] [INFO] [raft.rs:923] ["[logterm: 15202, index: 4800001] sent request to 8628"] [msg=MsgRequestVote] [term=15203] [id=8628] [log_index=4800001] [log_term=15202] [raft_id=4172] [region_id=4171]
[2021/04/16 07:10:18.924 +00:00] [INFO] [raft.rs:923] ["[logterm: 15202, index: 4800001] sent request to 8570"] [msg=MsgRequestVote] [term=15203] [id=8570] [log_index=4800001] [log_term=15202] [raft_id=4172] [region_id=4171]
[2021/04/16 13:44:01.559 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[2021/04/17 00:36:39.177 +00:00] [WARN] [endpoint.rs:527] [error-response] [err="Region error (will back off and retry) message: \"peer is not leader for region 4800001, leader may Some(id: 4800004 store_id: 7002)\" not_leader { region_id: 4800001 leader { id: 4800004 store_id: 7002 } }"]
[root@k8s-master01 bin]# kubectl logs -njinfan tidb-cluster-1605234515-tikv-1 |grep 4800001
[2021/04/13 14:10:50.011 +00:00] [INFO] [peer.rs:159] ["create peer"] [peer_id=4800003] [region_id=4800001]
[2021/04/13 14:10:50.011 +00:00] [INFO] [raft.rs:783] ["became follower at term 35"] [term=35] [raft_id=4800003] [region_id=4800001]
[2021/04/13 14:10:50.011 +00:00] [INFO] [raft.rs:285] [newRaft] [peers="[(4800004, Progress { matched: 0, next_idx: 36, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } }), (4800002, Progress { matched: 0, next_idx: 36, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } }), (4800003, Progress { matched: 35, next_idx: 36, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } })]"] ["last term"=35] ["last index"=35] [applied=35] [commit=35] [term=35] [raft_id=4800003] [region_id=4800001]
[2021/04/13 14:10:50.011 +00:00] [INFO] [raw_node.rs:222] ["RawNode created with id 4800003."] [id=4800003] [raft_id=4800003] [region_id=4800001]
[2021/04/13 14:10:55.281 +00:00] [INFO] [raft.rs:1192] ["[logterm: 35, index: 35, vote: 4800003] cast vote for 4800004 [logterm: 35, index: 35] at term 35"] ["msg type"=MsgRequestPreVote] [term=35] [msg_index=35] [msg_term=35] [from=4800004] [vote=4800003] [log_index=35] [log_term=35] [raft_id=4800003] [region_id=4800001]
[2021/04/13 14:10:55.288 +00:00] [INFO] [raft.rs:1003] ["received a message with higher term from 4800004"] ["msg type"=MsgRequestVote] [message_term=36] [term=35] [from=4800004] [raft_id=4800003] [region_id=4800001]
[2021/04/13 14:10:55.288 +00:00] [INFO] [raft.rs:783] ["became follower at term 36"] [term=36] [raft_id=4800003] [region_id=4800001]
[2021/04/13 14:10:55.289 +00:00] [INFO] [raft.rs:1192] ["[logterm: 35, index: 35, vote: 0] cast vote for 4800004 [logterm: 35, index: 35] at term 36"] ["msg type"=MsgRequestVote] [term=36] [msg_index=35] [msg_term=35] [from=4800004] [vote=0] [log_index=35] [log_term=35] [raft_id=4800003] [region_id=4800001]
[2021/04/16 07:10:18.926 +00:00] [INFO] [raft.rs:1192] ["[logterm: 15202, index: 4800001, vote: 0] cast vote for 4172 [logterm: 15202, index: 4800001] at term 15203"] ["msg type"=MsgRequestVote] [term=15203] [msg_index=4800001] [msg_term=15202] [from=4172] [vote=0] [log_index=4800001] [log_term=15202] [raft_id=8570] [region_id=4171]
kubectl logs -njinfan tidb-cluster-1605234515-tikv-2 |grep 4800001
。。。。。。
。。。。。。
。。。。。。
[2021/04/17 02:28:18.405 +00:00] [INFO] [process.rs:145] ["get snapshot failed"] [err="Request(message: \"EpochNotMatch current epoch of region 4800001 is conf_ver: 5 version: 791, but you sent conf_ver: 11 version: 801\" epoch_not_match { current_regions { id: 4800001 start_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A922D2FF5B00000003B60551FF2EE6006000000000FC region_epoch { conf_ver: 5 version: 791 } peers { id: 4800002 store_id: 1 } peers { id: 4800003 store_id: 7001 } peers { id: 4800004 store_id: 7002 } } current_regions { id: 4602013 start_key: 7480000000000006FF9C5F698000000000FF0000040419A82AE1FF0400000003B596A4FFF70B806000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC region_epoch { conf_ver: 5 version: 790 } peers { id: 4602014 store_id: 1 } peers { id: 4602015 store_id: 7001 } peers { id: 4602016 store_id: 7002 } } })"] [cid=1734620]
[2021/04/17 02:28:19.407 +00:00] [INFO] [process.rs:145] ["get snapshot failed"] [err="Request(message: \"EpochNotMatch current epoch of region 4800001 is conf_ver: 5 version: 791, but you sent conf_ver: 11 version: 801\" epoch_not_match { current_regions { id: 4800001 start_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A922D2FF5B00000003B60551FF2EE6006000000000FC region_epoch { conf_ver: 5 version: 791 } peers { id: 4800002 store_id: 1 } peers { id: 4800003 store_id: 7001 } peers { id: 4800004 store_id: 7002 } } current_regions { id: 4602013 start_key: 7480000000000006FF9C5F698000000000FF0000040419A82AE1FF0400000003B596A4FFF70B806000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC region_epoch { conf_ver: 5 version: 790 } peers { id: 4602014 store_id: 1 } peers { id: 4602015 store_id: 7001 } peers { id: 4602016 store_id: 7002 } } })"] [cid=1734621]
[2021/04/17 02:28:20.409 +00:00] [INFO] [process.rs:145] ["get snapshot failed"] [err="Request(message: \"EpochNotMatch current epoch of region 4800001 is conf_ver: 5 version: 791, but you sent conf_ver: 11 version: 801\" epoch_not_match { current_regions { id: 4800001 start_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A922D2FF5B00000003B60551FF2EE6006000000000FC region_epoch { conf_ver: 5 version: 791 } peers { id: 4800002 store_id: 1 } peers { id: 4800003 store_id: 7001 } peers { id: 4800004 store_id: 7002 } } current_regions { id: 4602013 start_key: 7480000000000006FF9C5F698000000000FF0000040419A82AE1FF0400000003B596A4FFF70B806000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC region_epoch { conf_ver: 5 version: 790 } peers { id: 4602014 store_id: 1 } peers { id: 4602015 store_id: 7001 } peers { id: 4602016 store_id: 7002 } } })"] [cid=1734622]
[2021/04/17 02:28:21.411 +00:00] [INFO] [process.rs:145] ["get snapshot failed"] [err="Request(message: \"EpochNotMatch current epoch of region 4800001 is conf_ver: 5 version: 791, but you sent conf_ver: 11 version: 801\" epoch_not_match { current_regions { id: 4800001 start_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A922D2FF5B00000003B60551FF2EE6006000000000FC region_epoch { conf_ver: 5 version: 791 } peers { id: 4800002 store_id: 1 } peers { id: 4800003 store_id: 7001 } peers { id: 4800004 store_id: 7002 } } current_regions { id: 4602013 start_key: 7480000000000006FF9C5F698000000000FF0000040419A82AE1FF0400000003B596A4FFF70B806000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC region_epoch { conf_ver: 5 version: 790 } peers { id: 4602014 store_id: 1 } peers { id: 4602015 store_id: 7001 } peers { id: 4602016 store_id: 7002 } } })"] [cid=1734623]
[2021/04/17 02:28:22.418 +00:00] [INFO] [process.rs:145] ["get snapshot failed"] [err="Request(message: \"EpochNotMatch current epoch of region 4800001 is conf_ver: 5 version: 791, but you sent conf_ver: 11 version: 801\" epoch_not_match { current_regions { id: 4800001 start_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A922D2FF5B00000003B60551FF2EE6006000000000FC region_epoch { conf_ver: 5 version: 791 } peers { id: 4800002 store_id: 1 } peers { id: 4800003 store_id: 7001 } peers { id: 4800004 store_id: 7002 } } current_regions { id: 4602013 start_key: 7480000000000006FF9C5F698000000000FF0000040419A82AE1FF0400000003B596A4FFF70B806000000000FC end_key: 7480000000000006FF9C5F698000000000FF0000040419A8BC54FF7300000003B5C9ACFF1F2E801000000000FC region_epoch { conf_ver: 5 version: 790 } peers { id: 4602014 store_id: 1 } peers { id: 4602015 store_id: 7001 } peers { id: 4602016 store_id: 7002 } } })"] [cid=1734624]
  1. 如果只有 pd-recover 感觉不会这样。
  2. 麻烦重启下 tikv-2 再看看 region 4800001 信息,多谢。

你好,tikv-2重启后的日志如下:

[root@k8s-master01 ~]# kubectl logs -njinfan tidb-cluster-1605234515-tikv-2 |grep 4800001
[2021/04/20 00:52:13.041 +00:00] [INFO] [peer.rs:159] ["create peer"] [peer_id=4800004] [region_id=4800001]
[2021/04/20 00:52:13.041 +00:00] [INFO] [raft.rs:783] ["became follower at term 36"] [term=36] [raft_id=4800004] [region_id=4800001]
[2021/04/20 00:52:13.041 +00:00] [INFO] [raft.rs:285] [newRaft] [peers="[(4800004, Progress { matched: 36, next_idx: 37, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } }), (4800002, Progress { matched: 0, next_idx: 37, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } }), (4800003, Progress { matched: 0, next_idx: 37, state: Probe, paused: false, pending_snapshot: 0, pending_request_snapshot: 0, recent_active: false, ins: Inflights { start: 0, count: 0, buffer: [] } })]"] ["last term"=36] ["last index"=36] [applied=36] [commit=36] [term=36] [raft_id=4800004] [region_id=4800001]
[2021/04/20 00:52:13.041 +00:00] [INFO] [raw_node.rs:222] ["RawNode created with id 4800004."] [id=4800004] [raft_id=4800004] [region_id=4800001]
[2021/04/20 00:52:20.191 +00:00] [INFO] [raft.rs:1003] ["received a message with higher term from 4800002"] ["msg type"=MsgHeartbeat] [message_term=37] [term=36] [from=4800002] [raft_id=4800004] [region_id=4800001]
[2021/04/20 00:52:20.191 +00:00] [INFO] [raft.rs:783] ["became follower at term 37"] [term=37] [raft_id=4800004] [region_id=4800001]

确认个信息:目前还有 region is unavailable 错误吗?

你好,目前还有 region is unavailable 的。

请问下目前现场还在吗?如果在的话,执行几个命令看下

  1. pd-ctl 执行 region --jq='.regions[]|select(has("leader")|not)|{id: .id, peer_stores: [.peers[].store_id]}' 查一下目前集群中没有 leader 的 region 信息
    执行这个命令需要先在机器上 yum install jq 安装一下 jq 这个命令
  2. 对于 region 4800001 这个 region ,目前还是没有 leader 么,如果还是没有 leader 的话,对几个 peer 对应的 store 节点执行一下 ./tikv-ctl --host ${store_ip}:${tikv_port} raft region -r 4800001 ,比如根据上面的信息,region 4800001 三个 peer 分别在 store 1/7001/7002 上,所以store_ip 和 tikv_port 分别对应这三个 store 的信息,将结果提供一下
  3. 在所有节点的 tikv.log 中 grep -i ‘region_id=4800001’ tikv.log* > 4800001.log ,压缩一下日志并上传有下,看下为什么 region 无法选举出 leader.
  4. pd-ctl 执行 store 命令的结果也提供一下