【 TiDB 使用环境】生产环境
【 TiDB 版本】v5.4.2
【复现路径】无
【遇到的问题:问题现象及影响】从监控上看schedule operator一直在timeout,尝试着扩容发现leader
也不进行调度,region
也不做均衡,业务上一直超时
【部分监控及日志】
2023-05-22 11:36:02 (GMT+8)TiKV 10.200.132.98:20163[<unknown>] ["New connected subchannel at 0x7f1ce50757f0 for subchannel 0x7f30c3e22540"]
2023-05-22 11:36:02 (GMT+8)TiKV 10.200.132.98:20163[<unknown>] ["New connected subchannel at 0x7f217d76be00 for subchannel 0x7f30c3e22fc0"]
2023-05-22 11:36:02 (GMT+8)TiKV 10.200.132.98:20163[<unknown>] ["New connected subchannel at 0x7f217dc405c0 for subchannel 0x7f30c3e23340"]
2023-05-22 11:36:02 (GMT+8)TiKV 10.200.132.98:20163[advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
2023-05-22 11:36:02 (GMT+8)TiKV 10.200.132.98:20163[advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
2023-05-22 11:36:03 (GMT+8)TiKV 10.200.132.98:20163[<unknown>] ["New connected subchannel at 0x7f21188a14d0 for subchannel 0x7f30c3e22540"]
2023-05-22 11:36:03 (GMT+8)TiKV 10.200.132.98:20163[<unknown>] ["New connected subchannel at 0x7f216e40c4e0 for subchannel 0x7f30c3e22fc0"]
2023-05-22 11:36:03 (GMT+8)TiKV 10.200.132.98:20163[advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
2023-05-22 11:36:03 (GMT+8)TiKV 10.200.132.98:20163[advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
2023-05-22 11:36:03 (GMT+8)TiKV 10.200.132.98:20163[advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
从监控上看leader、region均未就行balance,是否需要做pd切换?
从tikv
日志来看,出现大量的check leader failed
,这些错误都是tikv
到tiflash
[2023/05/22 16:16:33.795 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:34.800 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:34.801 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:34.802 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:35.803 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:35.804 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:35.805 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:36.800 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:36.801 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:36.802 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:37.811 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:37.812 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:37.813 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:38.803 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:38.805 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:38.806 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:39.832 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:39.833 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:39.834 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:40.843 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:40.845 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:40.846 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:41.842 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:41.844 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420633] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:41.845 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=1580508] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
[2023/05/22 16:16:42.837 +08:00] [INFO] [advance.rs:295] ["check leader failed"] [to_store=420634] [error="\"[rpc failed] RpcFailure: 12-UNIMPLEMENTED\""]
所有节点日志都看了一遍,其中有一个节点日志报错不太一样get snapshot failed
[2023/05/22 19:29:29.641 +08:00] [INFO] [scheduler.rs:517] ["get snapshot failed"] [err="Error(Request(message: \"EpochNotMatch current epoch of region 233969 is conf_ver: 209 version: 19544, but you sent conf_ver: 209 version: 19315\" epoch_not_match { current_regions { id: 233969 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF431D172C03EC0000FF04144F527B000000FC end_key: 7480000000000000FF495F698000000000FF0000040125303025FF30302530FF302530FF3025303025FF3030FF253030253030FF25FF30302530302530FFFF3025303025303025FFFF30302530302530FF30FF253030253030FF2530FF3025303025FF303025FF30302530FF30253030FF253030FF2530302530FF3025FF303025303025FF30FF30253030253030FFFF2530302530302530FFFF30253030253030FF25FF303025303025FF3030FF2530302530FF302530FF30253030FF25303025FF303025FF3030253030FF2530FF302530302530FF30FF25303000000000FBFF038C000000DCA922FF5400000000000000F8 region_epoch { conf_ver: 209 version: 19544 } peers { id: 920709 store_id: 918725 } peers { id: 1233844 store_id: 2 } peers { id: 1452699 store_id: 1441108 } } current_regions { id: 2204582 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF43099EE403B00000FF041440B0F8000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF431D172C03EC0000FF04144F527B000000FC region_epoch { conf_ver: 209 version: 19544 } peers { id: 2204583 store_id: 918725 } peers { id: 2204584 store_id: 2 } peers { id: 2204585 store_id: 1441108 } } current_regions { id: 2203237 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF42F61AF103D40000FF041432227B000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF43099EE403B00000FF041440B0F8000000FC region_epoch { conf_ver: 209 version: 19543 } peers { id: 2203238 store_id: 918725 } peers { id: 2203239 store_id: 2 } peers { id: 2203240 store_id: 1441108 } } current_regions { id: 2201777 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF42E3AE7403CC0000FF041423C243000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF42F61AF103D40000FF041432227B000000FC region_epoch { conf_ver: 209 version: 19542 } peers { id: 2201778 store_id: 918725 } peers { id: 2201779 store_id: 2 } peers { id: 2201780 store_id: 1441108 } } current_regions { id: 2200253 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF42CF694503A00000FF041414D5F4000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF42E3AE7403CC0000FF041423C243000000FC region_epoch { conf_ver: 209 version: 19541 } peers { id: 2200254 store_id: 918725 } peers { id: 2200255 store_id: 2 } peers { id: 2200256 store_id: 1441108 } } current_regions { id: 2198838 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF42B94BE703C40000FF041405DF65000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF42CF694503A00000FF041414D5F4000000FC region_epoch { conf_ver: 209 version: 19540 } peers { id: 2198839 store_id: 918725 } peers { id: 2198840 store_id: 2 } peers { id: 2198841 store_id: 1441108 } } current_regions { id: 2197255 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF42A1B2AF03900000FF0413F75E9C000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF42B94BE703C40000FF041405DF65000000FC region_epoch { conf_ver: 209 version: 19539 } peers { id: 2197256 store_id: 918725 } peers { id: 2197257 store_id: 2 } peers { id: 2197258 store_id: 1441108 } } current_regions { id: 2195432 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF42885F55038C0000FF0413E8906D000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF42A1B2AF03900000FF0413F75E9C000000FC region_epoch { conf_ver: 209 version: 19538 } peers { id: 2195433 store_id: 918725 } peers { id: 2195434 store_id: 2 } peers { id: 2195435 store_id: 1441108 } } current_regions { id: 2193606 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF426E7D0503D80000FF0413DA3915000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF42885F55038C0000FF0413E8906D000000FC region_epoch { conf_ver: 209 version: 19537 } peers { id: 2193607 store_id: 918725 } peers { id: 2193608 store_id: 2 } peers { id: 2193609 store_id: 1441108 } } current_regions { id: 2191754 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF4253FCC203B00000FF0413CBDE69000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF426E7D0503D80000FF0413DA3915000000FC region_epoch { conf_ver: 209 version: 19536 } peers { id: 2191755 store_id: 918725 } peers { id: 2191756 store_id: 2 } peers { id: 2191757 store_id: 1441108 } } current_regions { id: 2189965 start_key: 7480000000000000FF495F698000000000FF0000030380000188FF4239A1FE03D40000FF0413BCE6AC000000FC end_key: 7480000000000000FF495F698000000000FF0000030380000188FF4253FCC203B00000FF0413CBDE69000000FC region_epoch { conf_ver: 209 version: 19535 } peers { id: 2189966 store_id: 918725 } peers { id: 2189967 store_id: 2 } peers { id: 2189968 store_id: 1441108 } } }))"] [cid=5999975400]
排查到原因:
tikv region不上报心跳到pd,在v5.4.2版本中,有已知的bug:https://github.com/tikv/tikv/issues/12934
- 尝试着重启pd,重新选择pd leader后,scheduler 开始工作了
- 集群升级至v5.4.3后,暂时无发现异常出现
system
(system)
关闭
9
此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。