Tikv节点无法正常启动

从 tikv 日志中看到第一次出现 panic 是因为 SST 文件损坏导致的

[2021/04/21 14:25:10.151 +08:00] [FATAL] [lib.rs:482] ["rocksdb background error. db: kv, reason: compaction, error: Corruption: block checksum mismatch: expected 3552756717, go       t 2675153938  in /tidb-data/tikv-20160/db/091787.sst offset 29994297 size 29942"] [backtrace="stack backtrace:\
   0: tikv_util::set_panic_hook::{{closure}}\
             at com       ponents/tikv_util/src/lib.rs:481\
   1: std::panicking::rust_panic_with_hook\
             at src/libstd/panicking.rs:475\
   2: rust_begin_unwind\
             at src/libstd/pa       nicking.rs:375\
   3: std::panicking::begin_panic_fmt\
             at src/libstd/panicking.rs:326\
   4: <engine_rocks::event_listener::RocksEventListener as rocksdb::event_lis       tener::EventListener>::on_background_error\
             at components/engine_rocks/src/event_listener.rs:66\
   5: rocksdb::event_listener::on_background_error\
             at        /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/src/event_listener.rs:254\
   6: _ZN24crocksdb_eventlistener_t17OnBackgroundErrorEN7rocksdb21BackgroundErrorReasonEPNS       0_6StatusE\
             at crocksdb/c.cc:2140\
   7: _ZN7rocksdb12EventHelpers23NotifyOnBackgroundErrorERKSt6vectorISt10shared_ptrINS_13EventListenerEESaIS4_EENS_21BackgroundEr       rorReasonEPNS_6StatusEPNS_17InstrumentedMutexEPb\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/librocksdb_sys/rocksdb/db/event_helpers.cc:53\
   8:        _ZN7rocksdb12ErrorHandler10SetBGErrorERKNS_6StatusENS_21BackgroundErrorReasonE\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/librocksdb_sys/rocksdb/       db/error_handler.cc:220\
   9: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE\
             at /rust/git       /checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2797\
  10: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19Pr       epickedCompactionENS_3Env8PriorityE\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2317       \
  11: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_f       lush.cc:2092\
  12: _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/librocksdb_sys/rocksdb/util/threadpool       _imp.cc:266\
  13: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv\
             at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/5345344/librocksdb_sys/rocksdb/util/th       readpool_imp.cc:307\
  14: execute_native_thread_routine\
  15: start_thread\
  16: clone\
"] [location=components/engine_rocks/src/event_listener.rs:66] [thread_name=<unnamed>]

查看系统日志发现有磁盘损坏的情况

Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: ffff8c2cf6bbac70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Apr 21 13:39:58 tidb-cluster-tidb kernel: XFS (vda2): metadata I/O error in "xfs_trans_read_buf_map" at daddr 0x12cdfc01 len 1 error 74
Apr 21 13:39:58 tidb-cluster-tidb kernel: XFS (vda2): page discard on page ffffd2e48e334380, inode 0x1501d40d, offset 202346496.
Apr 21 13:40:28 tidb-cluster-tidb kernel: XFS (vda2): Metadata CRC error detected at xfs_agf_read_verify+0xde/0x100 [xfs], xfs_agf block 0x12cdfc01
Apr 21 13:40:28 tidb-cluster-tidb kernel: XFS (vda2): Unmount and run xfs_repair
Apr 21 13:40:28 tidb-cluster-tidb kernel: XFS (vda2): First 128 bytes of corrupted metadata buffer:

所以应该是磁盘损坏引起的问题,可以尝试修复下磁盘看能不能恢复,不能恢复的话,需要将这个节点重新扩缩容一下。
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/checking-and-repairing-a-file-system_managing-file-systems

是有磁盘问题。感谢大大。

hello,我也遇到了相同的问题,现在tikv启动不了了,
报错提示为:[2021/11/24 14:17:05.055 +08:00] [FATAL] [server.rs:312] [“panic_mark_file /data/tidb-data/tikv-20160/panic_mark_file exists, there must be something wrong with the db.”]

这个问题你知道怎么解决吗?

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。