一天多的时间里集群中4个tikv有3个重启了,tikv FATAL报错index out of bounds: the len is 6 but the index is 6

【 TiDB 使用环境】生产环境
【 TiDB 版本】5.0.1
【遇到的问题:问题现象及影响】
一天多的时间里集群中4个tikv有3个重启了,tikv FATAL报错index out of bounds: the len is 6 but the index is 6,找到 了一个issue:https://github.com/pingcap/tidb/issues/39188,但没有说明原因?这个是什么问题呢?

3个tikv重启都有这样的FATAL日志
[2023/08/19 15:56:40.857 +08:00] [FATAL] [lib.rs:465] [“index out of bounds: the len is 6 but the index is 6”] [backtrace=“stack backtrace:\n 0: tikv_util::set_panic_hook::{{closure}}\n at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/components/tikv_util/src/lib.rs:464\n 1: std::panicking::rust_panic_with_hook\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:595\n 2: std::panicking::begin_panic_handler::{{closure}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:497\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys_common/backtrace.rs:141\n 4: rust_begin_unwind\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:493\n 5: core::panicking::panic_fmt\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:92\n 6: core::panicking::panic_bounds_check\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:69\n 7: <usize as core::slice::index::SliceIndex<[T]>>::index_mut\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:188\n core::slice::index::<impl core::ops::index::IndexMut for [T]>::index_mut\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:26\n <alloc::vec::Vec<T,A> as core::ops::index::IndexMut>::index_mut\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/vec/mod.rs:2054\n tokio_timer::wheel::Wheel::insert\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114\n tokio_timer::timer::Timer<T,N>::add_entry\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:324\n 8: tokio_timer::timer::Timer<T,N>::process_queue\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:301\n 9: <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:361\n tokio_timer::timer::Timer<T,N>::turn\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:256\n 10: tikv_util::timer::start_global_timer::{{closure}}\n at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/components/tikv_util/src/timer.rs:95\n 11: std::sys_common::backtrace::__rust_begin_short_backtrace\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/sys_common/backtrace.rs:125\n 12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:474\n 13: <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:322\n 14: std::panicking::try::do_call\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:379\n std::panicking::try\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:343\n std::panic::catch_unwind\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:396\n std::thread::Builder::spawn_unchecked::{{closure}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:473\n core::ops::function::FnOnce::call_once{{vtable.shim}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/ops/function.rs:227\n 15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n std::sys::unix::thread::thread::new::thread_start\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys/unix/thread.rs:71\n 16: start_thread\n 17: clone\n”] [location=/rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114] [thread_name=timer]

监控看看重启的tikv的资源使用情况

有没有高消耗内存的SQL,可以看看系统的日志,有木有OOM啥的

index out of bounds: the len is 6 but the index is 6
这个以前是有bug的

TiKV running over 2 years may panic · Issue #11940 · tikv/tikv (github.com)
tikv是不是2年没重启了,这个是个已知bug。。。。。

3 个赞

没有,集群流量很低

是的2年多没重启了,应该是这个原因的

建议升级到5.0.6版本

后面可能都没有升级计划了,先巡检一波,联系业务滚动约个时间滚动重启下。

扫描了下,超过2年的tikv节点还有20多个 :joy:

集群很大吗,一共多少个tidb、tikv、tipd、tiflash节点

5.5T, 4 tikv, 5 tidb, 3 pd,无tiflash

很少的集群,至少可以升级到5.0.6版本

看来适当重启有利于隐藏bug :smile:

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。