tikv报错:assertion failed: (left >= right)

【 TiDB 使用环境】 生产集群

由于集群压力比较大,导致kakfa同步任务中断。 当启动其中两个同步任务时,tikv发生panic。

【TiDB 版本】4.0.11

已经设置:set config tikv split.qps-threshold=3000000

FATAL:
[2021/10/27 17:39:38.968 +08:00] [FATAL] [lib.rs:482] [“assertion failed: (left >= right) (left: TimeStamp(428682051579544801), right: TimeStamp(428682057084567595))”] [backtrace="stack backtrace:
0: tikv_util::set_panic_hook::{{closure}}
at components/tikv_util/src/lib.rs:481
1: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:475
2: rust_begin_unwind
at src/libstd/panicking.rs:375
3: std::panicking::begin_panic_fmt
at src/libstd/panicking.rs:326
4: tikv::storage::mvcc::reader::scanner::seek_for_valid_write
at /home/jenkins/agent/workspace/build_tikv_multi_branch_v4.0.11/tikv/<::std::macros::panic macros>:9
tikv::storage::mvcc::reader::scanner::seek_for_valid_value
at /home/jenkins/agent/workspace/build_tikv_multi_branch_v4.0.11/tikv/src/storage/mvcc/reader/scanner/mod.rs:418
5: <tikv::storage::mvcc::reader::scanner::forward::DeltaEntryPolicy as tikv::storage::mvcc::reader::scanner::forward::ScanPolicy>::handle_lock
at /home/jenkins/agent/workspace/build_tikv_multi_branch_v4.0.11/tikv/src/storage/mvcc/reader/scanner/forward.rs:609
tikv::storage::mvcc::reader::scanner::forward::ForwardScanner<S,P>::read_next
at /home/jenkins/agent/workspace/build_tikv_multi_branch_v4.0.11/tikv/src/storage/mvcc/reader/scanner/forward.rs:258
<tikv::storage::mvcc::reader::scanner::forward::ForwardScanner<S,P> as tikv::storage::txn::store::TxnEntryScanner>::next_entry
at /home/jenkins/agent/workspace/build_tikv_multi_branch_v4.0.11/tikv/src/storage/mvcc/reader/scanner/forward.rs:744
cdc::endpoint::Initializer::scan_batch
at components/cdc/src/endpoint.rs:876
6: cdc::endpoint::Endpoint::on_register::{{closure}}
at components/cdc/src/endpoint.rs:827
<futures::future::then::Then<A,B,F> as futures::future::Future>::poll::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/future/then.rs:33
futures::future::chain::Chain<A,B,C>::poll
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/future/chain.rs:39
<futures::future::then::Then<A,B,F> as futures::future::Future>::poll
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/future/then.rs:32
7: <alloc::boxed::Box as futures::future::Future>::poll
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/future/mod.rs:113
futures::task_impl::Spawn::poll_future_notify::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:329
futures::task_impl::Spawn::enter::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:399
futures::task_impl::std::set
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/task_impl/std/mod.rs:83
futures::task_impl::Spawn::enter
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:399
futures::task_impl::Spawn::poll_fn_notify
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:291
futures::task_impl::Spawn::poll_future_notify
at /rust/registry/src/github.com-1ecc6299db9ec823/futures-0.1.29/src/task_impl/mod.rs:329
tokio_threadpool::task::Task::run::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/task/mod.rs:145
core::ops::function::FnOnce::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libcore/ops/function.rs:232
<std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panic.rs:318
std::panicking::try::do_call
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panicking.rs:292
std::panicking::try
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8//src/libpanic_unwind/lib.rs:78
std::panic::catch_unwind
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panic.rs:394
tokio_threadpool::task::Task::run
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/task/mod.rs:130
tokio_threadpool::worker::Worker::run_task2
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:567
tokio_threadpool::worker::Worker::run_task
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:459
8: tokio_threadpool::worker::Worker::try_run_owned_task
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:390
tokio_threadpool::worker::Worker::try_run_task
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:297
tokio_threadpool::worker::Worker::run
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:241
9: tokio_threadpool::worker::Worker::do_run::{{closure}}::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:129
tokio_executor::global::with_default::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:209
std::thread::local::LocalKey::try_with
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/local.rs:262
std::thread::local::LocalKey::with
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/local.rs:239
tokio_executor::global::with_default
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-executor-0.1.8/src/global.rs:178
tokio_threadpool::worker::Worker::do_run::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:125
std::thread::local::LocalKey::try_with
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/local.rs:262
std::thread::local::LocalKey::with
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/local.rs:239
tokio_threadpool::worker::Worker::do_run
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/worker/mod.rs:116
tokio_threadpool::pool::Pool::spawn_thread::{{closure}}
at /rust/registry/src/github.com-1ecc6299db9ec823/tokio-threadpool-0.1.16/src/pool/mod.rs:345
std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/sys_common/backtrace.rs:136
10: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/mod.rs:469
<std::panic::AssertUnwindSafe as core::ops::function::FnOnce<()>>::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panic.rs:318
std::panicking::try::do_call
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panicking.rs:292
std::panicking::try
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8//src/libpanic_unwind/lib.rs:78
std::panic::catch_unwind
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/panic.rs:394
std::thread::Builder::spawn_unchecked::{{closure}}
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libstd/thread/mod.rs:468
core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/libcore/ops/function.rs:232
11: <alloc::boxed::Box as core::ops::function::FnOnce>::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/liballoc/boxed.rs:1022
12: <alloc::boxed::Box as core::ops::function::FnOnce
>::call_once
at /rustc/0de96d37fbcc54978458c18f5067cd9817669bc8/src/liballoc/boxed.rs:1022
std::sys_common::thread::start_thread
at src/libstd/sys_common/thread.rs:13
std::sys::unix::thread::thread::new::thread_start
at src/libstd/sys/unix/thread.rs:80
13: start_thread
14: __clone
"] [location=src/storage/mvcc/reader/scanner/mod.rs:389] [thread_name=cdcwkr0]
[2021/10/27 17:40:17.819 +08:00] [INFO] [lib.rs:92] [“Welcome to TiKV”]

截图:
image

tikv日志:fatal.log.gz (103.4 KB)

1 个赞

这个集群的 tiup cluster display {cluster_name} 拓扑烦请提供下,看日志是有 CDC 的吗?

1 个赞

有CDC组件的

1 个赞

收到,这边先看下,有进展会跟帖回复 ~

1 个赞

好的,感谢

你好,在 https://github.com/tikv/tikv/pull/9567 中我们已经有类似的修复,但 v4.0.11 已经携带有这个修复,因此这可能是一个新的问题。如果方便的话,我们可以基于 4.0.11 为你提供一个能够打印问题信息的 patch 版本的 tikv,然后排查复现。

可以的,请问我需要怎么操作。

麻烦通过 tiup cluster patch 部署一下这个 patch 版的 TiKV。
https://pingcap.feishu.cn/file/boxcnTCWgOP6TcNV8eOE2ArJlyf

问题复现的话,日志中会有 “encounter error in seek_for_valid_value” 和 “unexpected write record” 等错误。届时麻烦提供一下相关错误日志。

另外,如果可以麻烦同时获取一下 “encounter error in seek_for_valid_value” 日志中的 current_user_key 的 MVCC 信息。可以通过 TiDB 的 HTTP 接口获取 curl http://{TiDBIP}:10080/mvcc/hex/{hexKey}
(current_user_key 可能是 TiKV encoded 格式的,可以通过 mok 工具 decode 一下)

如果 TiKV 日志持续打印过多的报错信息,请暂时关闭 CDC.

收到。非常感谢及时响应。

你好,有替换 hotfix 版本,复现问题吗?

还没有,目前这边还在进行硬件扩容,等完成之后才会做替换。

更换patch版本之后,这个问题解决了吗?我这边3.0的集群也遇到了同样的问题,在考虑要不要升级

这边还没有更换,这边计划升级到4.0.15,如果还有这个问题再进行替换。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。