【 TiDB 使用环境】生产环境
【 TiDB 版本】 5.0.6
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】 5.0.6版本下某集群多个TIKV节点逐个异常重启
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
从8:17左右某集群TIKV多个节点异常重启。
集群情况:
机器内存没有OOM:
message日志:
TI KV日志:
【 TiDB 使用环境】生产环境
【 TiDB 版本】 5.0.6
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】 5.0.6版本下某集群多个TIKV节点逐个异常重启
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
从8:17左右某集群TIKV多个节点异常重启。
集群情况:
机器内存没有OOM:
message日志:
TI KV日志:
更详细的日志:
2024/08/12 08:17:40.043 +08:00] [INFO] [util.rs:544] [“connecting to PD endpoint”] [endpoints=http://10.27.74.138:2379]
[2024/08/12 08:17:40.044 +08:00] [INFO] [] [“New connected subchannel at 0x7fa41f480fa0 for subchannel 0x7fb77ba5c540”]
[2024/08/12 08:17:40.058 +08:00] [FATAL] [lib.rs:464] [“index out of bounds: the len is 6 but the index is 6”] [backtrace=“stack backtrace:\n 0: tikv_util::set_panic_hook::{{closure}}\n at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/tikv_util/src/lib.rs:463\n 1: std::panicking::rust_panic_with_hook\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:595\n 2: std::panicking::begin_panic_handler::{{closure}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:497\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys_common/backtrace.rs:141\n 4: rust_begin_unwind\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:493\n 5: core::panicking::panic_fmt\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:92\n 6: core::panicking::panic_bounds_check\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/core/src/panicking.rs:69\n 7: <usize as core::slice::index::SliceIndex<[T]>>::index_mut\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:188\n core::slice::index::<impl core::ops::index::IndexMut for [T]>::index_mut\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/slice/index.rs:26\n <alloc::vec::Vec<T,A> as core::ops::index::IndexMut>::index_mut\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/vec/mod.rs:2054\n tokio_timer::Wheel::insert\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114\n tokio_timer:
:Timer<T,N>::add_entry\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:324\n 8: tokio_timer:
:Timer<T,N>::process_queue\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:301\n 9: <tokio_timer:
:Timer<T,N> as tokio_executor:
:Park>::park\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:361\n tokio_timer:
:Timer<T,N>::turn\n at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/timer/mod.rs:256\n 10: tikv_util:
:start_global_timer::{{closure}}\n at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/tikv_util/src/timer.rs:97\n 11: std::sys_common::backtrace::__rust_begin_short_backtrace\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/sys_common/backtrace.rs:125\n 12: std:
:Builder::spawn_unchecked::{{closure}}::{{closure}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:474\n 13: <std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:322\n 14: std::panicking::try::do_call\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:379\n std::panicking::try\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panicking.rs:343\n std::panic::catch_unwind\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/panic.rs:396\n std:
:Builder::spawn_unchecked::{{closure}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/std/src/thread/mod.rs:473\n core::ops::function::FnOnce::call_once{{vtable.shim}}\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/core/src/ops/function.rs:227\n 15: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35/library/alloc/src/boxed.rs:1484\n std::sys::unix:
:thread_start\n at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys/unix/thread.rs:71\n 16: start_thread\n 17: __clone\n”] [location=/rust/registry/src/github.com-1ecc6299db9ec823/tokio-timer-0.2.13/src/wheel/mod.rs:114] [thread_name=timer]
[2024/08/12 08:18:14.336 +08:00] [INFO] [lib.rs:90] [“Welcome to TiKV”]
tikv-0812-small-log.log (66.0 KB)
可以看下监控,看下 uptime 是不是 2 年。
没有这样早的日志了:( 是运行了很长时间。
是要升到5.4.2 。见有写: This is a known issue that is already fix in v5.4.2. Please either update your cluster to a version >= v5.4.2 or you can just workaround it by restarting the tikv within 795 days.。
tikv detail >> cluster >> uptime
grafana 选一下时间看下即可。
这么老的版本,大概率是触发某个bug了
还好啦 2年一次bug