TiKV panic_mark_file

Bug 反馈

【 Bug 的影响】
版本 v4.0.5

[2021/07/16 09:20:19.882 +08:00] [FATAL] [lib.rs:482] [“rocksdb background error. db: kv, reason: compaction, error: Corruption: L3 have overlapping ranges ‘7A7480000000000005FF345F698000000000FF0000010163343465FF63613331FF2D3231FF63612D3434FF3165FF2D623433332DFF64FF31616434633830FFFF3930643900000000FFFB00000000000000F8FA1B5F83F483FFF1’ seq:59783224186, type:0 vs. ‘7A7480000000000005FF345F698000000000FF0000010161663636FF37313166FF2D3530FF64372D3462FF3062FF2D613162332DFF63FF33373138316534FFFF3230386300000000FFFB00000000000000F8FA1AF9B804C3FE87’ seq:59669990815, type:1”] [backtrace=“stack backtrace:\n 0: tikv_util::set_panic_hook::{{closure}}\n at components/tikv_util/src/lib.rs:481\n 1: std::panicking::rust_panic_with_hook\n at src/libstd/panicking.rs:475\n 2: rust_begin_unwind\n at src/libstd/panicking.rs:375\n 3: std::panicking::begin_panic_fmt\n at src/libstd/panicking.rs:326\n 4: <engine_rocks::event_listener::RocksEventListener as rocksdb::event_listener::EventListener>::on_background_error\n at components/engine_rocks/src/event_listener.rs:66\n 5: rocksdb::event_listener::on_background_error\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/src/event_listener.rs:254\n 6: _ZN24crocksdb_eventlistener_t17OnBackgroundErrorEN7rocksdb21BackgroundErrorReasonEPNS0_6StatusE\n at crocksdb/c.cc:2140\n 7: _ZN7rocksdb12EventHelpers23NotifyOnBackgroundErrorERKSt6vectorISt10shared_ptrINS_13EventListenerEESaIS4_EENS_21BackgroundErrorReasonEPNS_6StatusEPNS_17InstrumentedMutexEPb\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/event_helpers.cc:53\n 8: _ZN7rocksdb12ErrorHandler10SetBGErrorERKNS_6StatusENS_21BackgroundErrorReasonE\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/error_handler.cc:220\n 9: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2797\n 10: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2317\n 11: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2092\n 12: _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/util/threadpool_imp.cc:266\n 13: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv\n at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/util/threadpool_imp.cc:307\n 14: execute_native_thread_routine\n 15: start_thread\n 16: __clone\n”] [location=components/engine_rocks/src/event_listener.rs:66] [thread_name=]
【可能的问题复现步骤】

【看到的非预期行为】
tikv panic
【期望看到的行为】

【相关组件及具体版本】

【其他背景信息或者截图】
如集群拓扑,系统和内核版本,应用 app 信息等;如果问题跟 SQL 有关,请提供 SQL 语句和相关表的 Schema 信息;如果节点日志存在关键报错,请提供相关节点的日志内容或文件;如果一些业务敏感信息不便提供,请留下联系方式,我们与您私下沟通。

[2021/07/16 09:20:56.419 +08:00] [INFO] [mod.rs:335] [“starting working thread”] [worker=addr-resolver]
[2021/07/16 09:20:56.419 +08:00] [INFO] [mod.rs:335] [“starting working thread”] [worker=region-collector-worker]
[2021/07/16 09:20:56.419 +08:00] [FATAL] [server.rs:303] [“panic_mark_file /tidb/data/storage/tikv/panic_mark_file exists, there must be something wrong with the db.”]

应该是已知问题导致的,rocksdb 的 cache key 冲突导致数据乱序。需要升级到 4.0.9 之后防止再次碰到这个问题 https://github.com/tikv/tikv/pull/9029