TiKV panic_mark_file

Bug 反馈

【 Bug 的影响】
版本 v4.0.5

[2021/07/16 09:20:19.882 +08:00] [FATAL] [lib.rs:482] [“rocksdb background error. db: kv, reason: compaction, error: Corruption: L3 have overlapping ranges ‘7A7480000000000005FF345F698000000000FF0000010163343465FF63613331FF2D3231FF63612D3434FF3165FF2D623433332DFF64FF31616434633830FFFF3930643900000000FFFB00000000000000F8FA1B5F83F483FFF1’ seq:59783224186, type:0 vs. ‘7A7480000000000005FF345F698000000000FF0000010161663636FF37313166FF2D3530FF64372D3462FF3062FF2D613162332DFF63FF33373138316534FFFF3230386300000000FFFB00000000000000F8FA1AF9B804C3FE87’ seq:59669990815, type:1”] [backtrace=“stack backtrace:\ 0: tikv_util::set_panic_hook::{{closure}}\ at components/tikv_util/src/lib.rs:481\ 1: std::panicking::rust_panic_with_hook\ at src/libstd/panicking.rs:475\ 2: rust_begin_unwind\ at src/libstd/panicking.rs:375\ 3: std::panicking::begin_panic_fmt\ at src/libstd/panicking.rs:326\ 4: <engine_rocks::event_listener::RocksEventListener as rocksdb::event_listener::EventListener>::on_background_error\ at components/engine_rocks/src/event_listener.rs:66\ 5: rocksdb::event_listener::on_background_error\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/src/event_listener.rs:254\ 6: _ZN24crocksdb_eventlistener_t17OnBackgroundErrorEN7rocksdb21BackgroundErrorReasonEPNS0_6StatusE\ at crocksdb/c.cc:2140\ 7: _ZN7rocksdb12EventHelpers23NotifyOnBackgroundErrorERKSt6vectorISt10shared_ptrINS_13EventListenerEESaIS4_EENS_21BackgroundErrorReasonEPNS_6StatusEPNS_17InstrumentedMutexEPb\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/event_helpers.cc:53\ 8: _ZN7rocksdb12ErrorHandler10SetBGErrorERKNS_6StatusENS_21BackgroundErrorReasonE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/error_handler.cc:220\ 9: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2797\ 10: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2317\ 11: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2092\ 12: _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/util/threadpool_imp.cc:266\ 13: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/d472363/librocksdb_sys/rocksdb/util/threadpool_imp.cc:307\ 14: execute_native_thread_routine\ 15: start_thread\ 16: __clone\ ”] [location=components/engine_rocks/src/event_listener.rs:66] [thread_name=]
【可能的问题复现步骤】

【看到的非预期行为】
tikv panic
【期望看到的行为】

【相关组件及具体版本】

【其他背景信息或者截图】
如集群拓扑,系统和内核版本,应用 app 信息等;如果问题跟 SQL 有关,请提供 SQL 语句和相关表的 Schema 信息;如果节点日志存在关键报错,请提供相关节点的日志内容或文件;如果一些业务敏感信息不便提供,请留下联系方式,我们与您私下沟通。

[2021/07/16 09:20:56.419 +08:00] [INFO] [mod.rs:335] [“starting working thread”] [worker=addr-resolver]
[2021/07/16 09:20:56.419 +08:00] [INFO] [mod.rs:335] [“starting working thread”] [worker=region-collector-worker]
[2021/07/16 09:20:56.419 +08:00] [FATAL] [server.rs:303] [“panic_mark_file /tidb/data/storage/tikv/panic_mark_file exists, there must be something wrong with the db.”]

应该是已知问题导致的,rocksdb 的 cache key 冲突导致数据乱序。需要升级到 4.0.9 之后防止再次碰到这个问题 https://github.com/tikv/tikv/pull/9029