tikv异常重启问题

【TiDB 使用环境】生产环境
【TiDB 版本】v7.5.5

【遇到的问题:问题现象及影响】
tikv异常重启,查看日志如下:
2025-10-14 15:36:40 (UTC+08:00)TiKV 10.55.170.48:30160[lib.rs:512] [“region 58346100 commit_ts: TimeStamp(461485480471167221), resolved_ts: TimeStamp(461485480523596607)”] [backtrace=" 0: tikv_util::set_panic_hook::{{closure}}\n at /workspace/source/tikv/components/tikv_util/src/lib.rs:511:18\n 1: <alloc::boxed::Box<F,A> as core::ops::function::Fn>::call\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2032:9\n std::panicking::rust_panic_with_hook\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:692:13\n 2: std::panicking::begin_panic_handler::{{closure}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:579:13\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18\n 4: rust_begin_unwind\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:575:5\n 5: core::panicking::panic_fmt\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:65:14\n 6: cdc::delegate::Delegate::sink_txn_put\n at /workspace/source/tikv/components/cdc/src/delegate.rs:929:21\n cdc::delegate::Delegate::sink_put\n at /workspace/source/tikv/components/cdc/src/delegate.rs:889:13\n cdc::delegate::Delegate::sink_data\n at /workspace/source/tikv/components/cdc/src/delegate.rs:694:21\n 7: cdc::delegate::Delegate::on_batch\n at /workspace/source/tikv/components/cdc/src/delegate.rs:561:17\n 8: cdc::endpoint::Endpoint<T,E,S>::on_multi_batch\n at /workspace/source/tikv/components/cdc/src/endpoint.rs:889:33\n <cdc::endpoint::Endpoint<T,E,S> as tikv_util::worker::pool::Runnable>::run\n at /workspace/source/tikv/components/cdc/src/endpoint.rs:1283:18\n 9: tikv_util::worker::pool::Worker::start_with_timer_impl::{{closure}}\n at /workspace/source/tikv/components/tikv_util/src/worker/pool.rs:506:25\n <core::future::from_generator::GenFuture as core::future::future::Future>::poll\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/future/mod.rs:91:19\n <tracker::tls::TrackedFuture as core::future::future::Future>::poll::{{closure}}\n at /workspace/source/tikv/components/tracker/src/tls.rs:64:23\n std::thread::local::LocalKey::try_with\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:446:16\n std::thread::local::LocalKey::with\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:422:9\n <tracker::tls::TrackedFuture as core::future::future::Future>::poll\n at /workspace/source/tikv/components/tracker/src/tls.rs:62:9\n <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll\n at /workspace/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/futures-util-0.3.31/src/future/future/map.rs:55:37\n <futures_util::future::future::Map<Fut,F> as core::future::future::Future>::poll\n at /workspace/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/futures-util-0.3.31/src/lib.rs:86:13\n yatp::task::future::RawTask::poll\n at /workspace/.cargo/git/checkouts/yatp-e704b73c3ee279b6/5572a78/src/task/future.rs:59:9\n 10: yatp::task::future::TaskCell::poll\n at /workspace/.cargo/git/checkouts/yatp-e704b73c3ee279b6/5572a78/src/task/future.rs:103:9\n <yatp::task::future::Runner as yatp::pool::runner::Runner>::handle\n at /workspace/.cargo/git/checkouts/yatp-e704b73c3ee279b6/5572a78/src/task/future.rs:387:20\n 11: <tikv_util::yatp_pool::YatpPoolRunner as yatp::pool::runner::Runner>::handle\n at /workspace/source/tikv/components/tikv_util/src/yatp_pool/mod.rs:199:24\n yatp::pool::worker::WorkerThread<T,R>::run\n at /workspace/.cargo/git/checkouts/yatp-e704b73c3ee279b6/5572a78/src/pool/worker.rs:48:13\n yatp::pool::builder::LazyBuilder::build::{{closure}}\n at /workspace/.cargo/git/checkouts/yatp-e704b73c3ee279b6/5572a78/src/pool/builder.rs:114:25\n std::sys_common::backtrace::rust_begin_short_backtrace\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:121:18\n 12: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:551:17\n <core::panic::unwind_safe::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:271:9\n std::panicking::try::do_call\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:483:40\n std::panicking::try\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:447:19\n std::panic::catch_unwind\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:137:14\n std::thread::Builder::spawn_unchecked::{{closure}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:550:30\n core::ops::function::FnOnce::call_once{{vtable.shim}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:513:5\n 13: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2000:9\n <alloc::boxed::Box<F,A> as core::ops::function::FnOnce>::call_once\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2000:9\n std::sys::unix::thread::thread::new::thread_start\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/unix/thread.rs:108:17\n 14: start_thread\n 15: __clone\n"] [location=components/cdc/src/delegate.rs:929] [thread_name=cdc-0] [thread_id=34]

如何通过日志查看tikv重启的原因呢,似乎日志并不能看出什么

1 个赞

查看操作系统日志,也并没有出现oom的信息,只记录的有
Oct 14 15:36:55 db098-prod-maintain systemd[1]: tikv-30160.service: Main process exited, code=exited, status=1/FAILURE
Oct 14 15:36:55 db098-prod-maintain systemd[1]: tikv-30160.service: Failed with result ‘exit-code’.

开没开ticdc功能?然后检查一下相关变更数据订阅任务是否存在异常?

应该是bug,我找找

https://github.com/tikv/tikv/issues/18142 升级到 v7.5.6 应该就解决了

3 个赞

最近逛tidb论坛。看到同学们的提问,有不少BUG。。
搞得我都不自信了。

尽量用新版本。新版本解决了性能问题,BUG。。

找bug喽~

升级大法也是不错的

什么样的bug啊

估计是这里的问题

开了,cdc任务看着都是正常的

什么情况才会触发这个bug呢,目前我们出现了两次,每次都是在集群有大量操作的时候。但是看集群整体负载并不算那么的高

提bug吧,或者安排升级,升级也挺简单的。

1 个赞

自己部署的建议最新版本的 能减少不少问题的出现

1 个赞

看着大概率是CDC 模块在处理 region 58346100 的数据时,触发了时间戳校验异常

1 个赞

先检查 CDC 任务状态,暂停下游压力大的任务

1 个赞

若问题频繁出现,升级 TiDB 版本到最新稳定版

1 个赞

主进程推出系统了吧

1 个赞

看着像是时间戳处理有异常,重启大法,升级大法。

1 个赞