TiKV 节点 sst文件损坏,如何恢复该节点

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:V4.0.0
  • 【问题描述】:

刻意在tikv 运行时,去copy其元数据,结果导致了其sst文件损坏。

报错如下:

[2020/06/10 18:33:39.654 +08:00] [WARN] [store.rs:595] ["[store 457262] handle 256 pending peers include 256 ready, 20 entries, 21866 messages and 0 snapshots"] [takes=3824]
[2020/06/10 18:33:39.858 +08:00] [FATAL] [lib.rs:481] ["rocksdb background error. db: kv, reason: compaction, error: Corruption: block checksum mismatch: expected 601837316, got 2029237636  in /data1/deploy/tidb/data/db/4329273.sst offset 997433 size 19653"] [backtrace="stack backtrace:\
   0: tikv_util::set_panic_hook::{{closure}}\
             at components/tikv_util/src/lib.rs:480\
   1: std::panicking::rust_panic_with_hook\
             at src/libstd/panicking.rs:475\
   2: rust_begin_unwind\
             at src/libstd/panicking.rs:375\
   3: std::panicking::begin_panic_fmt\
             at src/libstd/panicking.rs:326\
   4: <engine_rocks::event_listener::RocksEventListener as rocksdb::event_listener::EventListener>::on_background_error\
             at components/engine_rocks/src/event_listener.rs:66\
   5: rocksdb::event_listener::on_background_error\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/src/event_listener.rs:254\
   6: _ZN24crocksdb_eventlistener_t17OnBackgroundErrorEN7rocksdb21BackgroundErrorReasonEPNS0_6StatusE\
             at crocksdb/c.cc:2140\
   7: _ZN7rocksdb12EventHelpers23NotifyOnBackgroundErrorERKSt6vectorISt10shared_ptrINS_13EventListenerEESaIS4_EENS_21BackgroundErrorReasonEPNS_6StatusEPNS_17InstrumentedMutexEPb\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/event_helpers.cc:53\
   8: _ZN7rocksdb12ErrorHandler10SetBGErrorERKNS_6StatusENS_21BackgroundErrorReasonE\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/error_handler.cc:220\
   9: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2797\
  10: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2317\
  11: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2092\
  12: _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/util/threadpool_imp.cc:266\
  13: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/util/threadpool_imp.cc:307\
  14: execute_native_thread_routine\
  15: start_thread\
  16: __clone\
"] [location=components/engine_rocks/src/event_listener.rs:66] [thread_name=<unnamed>]

目前pd-ctl 中以及 为 ”down“状态,leader region 是0, region_count 有25602。

希望能把这个节点 恢复

如果是3副本,并且有多个节点,其他节点都是正常,可以考虑缩容扩容该节点

一模一样的日志, 麻烦问下是什么情况会导致这个问题, 我们是线上正常使用的集群, 突然报错的

[2021/01/11 01:57:45.577 +08:00] [FATAL] [lib.rs:481] [“rocksdb background error. db: kv, reason: compaction, error: Corruption: block checksum mismatch: expected 823179595, got 2540679874 in /data5/deploy1/data/db/32209549.sst offset 4077077 size 19454”] [backtrace=“stack backtrace:\ 0: tikv_util::set_panic_hook::{{closure}}\ at components/tikv_util/src/lib.rs:480\ 1:std::panicking::rust_panic_with_hook\ at src/libstd/panicking.rs:475\ 2: rust_begin_unwind\ at src/libstd/panicking.rs:375\ 3: std::panicking::begin_panic_fmt\ at src/libstd/panicking.rs:326\ 4: <engine_rocks::event_listener::RocksEventListener as rocksdb::event_listener::EventListener>::on_background_error\ at components/engine_rocks/src/event_listener.rs:66\ 5: rocksdb::event_listener::on_background_error\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/src/event_listener.rs:254\ 6: _ZN24crocksdb_eventlistener_t17OnBackgroundErrorEN7rocksdb21BackgroundErrorReasonEPNS0_6StatusE\ at crocksdb/c.cc:2140\ 7: _ZN7rocksdb12EventHelpers23NotifyOnBackgroundErrorERKSt6vectorISt10shared_ptrINS_13EventListenerEESaIS4_EENS_21BackgroundErrorReasonEPNS_6StatusEPNS_17InstrumentedMutexEPb\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/event_helpers.cc:53\ 8: _ZN7rocksdb12ErrorHandler10SetBGErrorERKNS_6StatusENS_21BackgroundErrorReasonE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/error_handler.cc:220\ 9: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2797\ 10: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2317\ 11: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2092\ 12: _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/util/threadpool_imp.cc:266\ 13: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/98aea25/librocksdb_sys/rocksdb/util/threadpool_imp.cc:307\ 14: execute_native_thread_routine\ 15: start_thread\ 16: clone\ ”] [location=components/engine_rocks/src/event_listener.rs:66] [thread_name=]