5.0.1 集群 3台tikv 同一时间重启(不是oom)

【 TiDB 使用环境】生产环境
【 TiDB 版本】5.0.1
【复现路径】未复现
【遇到的问题:问题现象及影响】
集群共8台tikv,3台tikv 重启导致集群不可用。

报错日志:

[2023/07/29 07:10:02.331 +08:00] [FATAL] [lib.rs:465] ["index out of bounds: the len is 6 but the index is 6"] [backtrace="stack backtrace:\n   0: tikv_util::set_panic_hook:
:{{closure}}\n             at /home/jenkins/agent/workspace/build_tikv_multi_branch_v5.0.1/tikv/components/tikv_util/src/lib.rs:464\n   1: std::panicking::rust_panic_with_ho
ok\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:595\n   2: std::panicking::begin_panic_handler::{{closure}}\n             a
t /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:497\n   3: std::sys_common::backtrace::__rust_end_short_backtrace\n             at /rustc/bc3
9d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys_common/backtrace.rs:141\n   4: rust_begin_unwind\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/
/library/std/src/panicking.rs:493\n   5: core::panicking::panic_fmt\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:92\n   6:
 core::panicking::panic_bounds_check\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:69\n   7: <usize as core::slice::index::
SliceIndex<[T]>>::index_mut\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:188\n      core::slice::index::<impl core::ops::
index::IndexMut<I> for [T]>::index_mut\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:26\n      <alloc::vec::Vec<T,A> as co
re::ops::index::IndexMut<I>>::index_mut\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/vec/mod.rs:2054\n      tokio_timer::wheel::Wheel<T
>::insert\n             at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114\n      tokio_timer::timer::Timer<T,N>::add_entry\n
    at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:324\n   8: tokio_timer::timer::Timer<T,N>::process_queue\n             at /rust/reg
istry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:301\n   9: <tokio_timer::timer::Timer<T,N> as tokio_executor::park::Park>::park\n             at /r
ust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:361\n      tokio_timer::timer::Timer<T,N>::turn\n             at /rust/registry/src/github.c
om-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:256\n  10: tikv_util::timer::start_global_timer::{{closure}}\n             at /home/jenkins/agent/workspace/build_tik
v_multi_branch_v5.0.1/tikv/components/tikv_util/src/timer.rs:95\n  11: std::sys_common::backtrace::__rust_begin_short_backtrace\n             at /rustc/bc39d4d9c514e5fdb40a5
782e6ca08924f979c35/library/std/src/sys_common/backtrace.rs:125\n  12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n             at /rustc/bc39d4d9c514e5
fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:474\n  13: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once\n             at /rustc/b
c39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:322\n  14: std::panicking::try::do_call\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/lib
rary/std/src/panicking.rs:379\n      std::panicking::try\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:343\n      std::panic:
:catch_unwind\n             at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:396\n      std::thread::Builder::spawn_unchecked::{{closure}}\n
       at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:473\n      core::ops::function::FnOnce::call_once{{vtable.shim}}\n             at /ru
stc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/ops/function.rs:227\n  15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n
   at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once\n
       at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n      std::sys::unix::thread::Thread::new::thread_start\n             at /rustc/bc
39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys/unix/thread.rs:71\n  16: start_thread\n  17: __clone\n"] [location=/rust/registry/src/github.com-1ecc6299db9ec823
/tokio-timer-0.2.13/src/wheel/mod.rs:114] [thread_name=timer]

没有其它error日志了吗,这怎么还有jenkins日志

这三个节点是不是有2年没重启了?
TiKV running over 2 years may panic · Issue #11940 · tikv/tikv (github.com)

1 个赞

有可能,不知道tikv 有没有uptime 的监控图
找到了https://docs.pingcap.com/zh/tidb/stable/grafana-tikv-dashboard

这不是tidb日志吧 怎么还有jenkins build信息

tikv日志

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。