tikv运行过程中出现panic

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】4.0.11

【问题描述】日常巡检时发现,系统在16:12左右,系统延迟p99.9异常升高到11.1s(正常时为200ms左右),在该时间点附近产生了大量慢查询日志,经排查发现,有一个tikv节点在16:11:54重启了,panic日志如下:

日志如下: [2021/03/13 16:11:54.301 +08:00] [FATAL] [lib.rs:482] [“Uniform::sample_single called with low >= high”] [backtrace="stack backtrace:
0: tikv_util::set_panic_hook::{{closure}}
at components/tikv_util/src/lib.rs:481
1: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:475
2: std::panicking::begin_panic
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panicking.rs:404
3: <rand::distributions::uniform::UniformInt as rand::distributions::uniform::UniformSampler>::sample_single
at /home/jenkins/agent/workspace/build_tikv_multi_branch_v4.0.11/tikv/<::std::macros::panic macros>:3
rand::Rng::gen_range
at /rust/registry/src/github.com-1ecc6299db9ec823/rand-0.6.5/src/lib.rs:245
4: raftstore::store::worker::split_controller::sample
at components/raftstore/src/store/worker/split_controller.rs:86
raftstore::store::worker::split_controller::AutoSplitController::flush
at components/raftstore/src/store/worker/split_controller.rs:375
5: raftstore::store::worker::pd::StatsMonitor::start::{{closure}}
at components/raftstore/src/store/worker/pd.rs:342
std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/sys_common/backtrace.rs:136
6: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/mod.rs:469
<std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panic.rs:318
std::panicking::try::do_call
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panicking.rs:292
std::panicking::try
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8//src/libpanic_unwind/lib.rs:78
std::panic::catch_unwind
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panic.rs:394
std::thread::Builder::spawn_unchecked::{{closure}}
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/mod.rs:468
core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libcore/ops/function.rs:232
7: <alloc::boxed::Box as core::ops::function::FnOnce>::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/liballoc/boxed.rs:1022
8: <alloc::boxed::Box as core::ops::function::FnOnce
>::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/liballoc/boxed.rs:1022
std::sys_common::thread::start_thread
at src/libstd/sys_common/thread.rs:13
std::sys::unix::thread::thread::new::thread_start
at src/libstd/sys/unix/thread.rs:80
9: start_thread
10: __clone
"] [location=/rust/registry/src/github.com-1ecc6299db9ec823/rand-0.6.5/src/distributions/uniform.rs:473] [thread_name=stats-monitor]
[2021/03/13 16:12:13.100 +08:00] [INFO] [lib.rs:92] [“Welcome to TiKV”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] []
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“Release Version: 4.0.11”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“Edition: Community”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“Git Commit Hash: 4ac5e7ea1839d63163e911e2e1164d663f49592b”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“Git Commit Branch: heads/refs/tags/v4.0.11”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“UTC Build Time: 2021-02-26 07:44:32”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“Rust Version: rustc 1.42.0-nightly (0de96d37f 2019-12-19)”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“Enable Features: jemalloc mem-profiling portable sse protobuf-codec”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [lib.rs:94] [“Profile: dist_release”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [mod.rs:58] [“memory limit in bytes: 269903224832, cpu cores quota: 20”]
[2021/03/13 16:12:13.101 +08:00] [WARN] [lib.rs:529] [“environment variable TZ is missing, using /etc/localtime”]
[2021/03/13 16:12:13.101 +08:00] [INFO] [config.rs:572] [“kernel parameters”] [value=65535] [param=net.core.somaxconn]
[2021/03/13 16:12:13.101 +08:00] [INFO] [config.rs:572] [“kernel parameters”] [value=0] [param=net.ipv4.tcp_syncookies]
[2021/03/13 16:12:13.101 +08:00] [INFO] [config.rs:572] [“kernel parameters”] [value=0] [param=vm.swappiness]
[2021/03/13 16:12:13.101 +08:00] [INFO] [util.rs:419] [“connecting to PD endpoint”] [endpoints=xxx.xxx.xxx.xxx:2379]
[2021/03/13 16:12:13.101 +08:00] [INFO] [] [“Disabling AF_INET6 sockets because ::1 is not available.”]
[2021/03/13 16:12:13.102 +08:00] [INFO] [] [“New connected subchannel at 0x7f1d5ea3a180 for subchannel 0x7f1d61e19a00”]
[2021/03/13 16:12:13.102 +08:00] [INFO] [util.rs:419] [“connecting to PD endpoint”] [endpoints=xxx.xxx.xxx.xxx:2379]
[2021/03/13 16:12:13.102 +08:00] [INFO] [] [“New connected subchannel at 0x7f1d5ea3a240 for subchannel 0x7f1d61e19a00”]
[2021/03/13 16:12:13.103 +08:00] [INFO] [util.rs:419] [“connecting to PD endpoint”] [endpoints=xxx.xxx.xxx.xxx:2379]
[2021/03/13 16:12:13.103 +08:00] [INFO] [] [“New connected subchannel at 0x7f1d5ea3a300 for subchannel 0x7f1d61e19a00”]
[2021/03/13 16:12:13.105 +08:00] [INFO] [util.rs:419] [“connecting to PD endpoint”] [endpoints=http://xxx.xxx.xxx.xxx:2379]
[2021/03/13 16:12:13.105 +08:00] [INFO] [] [“New connected subchannel at 0x7f1d5ea3a3c0 for subchannel 0x7f1d61e19a00”]
[2021/03/13 16:12:13.105 +08:00] [INFO] [util.rs:419] [“connecting to PD endpoint”] [endpoints=http://xxx.xxx.xxx.xxx:2379]
[2021/03/13 16:12:13.106 +08:00] [INFO] [] [“New connected subchannel at 0x7f1d5ea3a480 for subchannel 0x7f1d61e19a00”]


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

这是一个已知问题,可以参考下这个帖子:tikv-server 重启故障排查

感谢!

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。