tikv异常重启

【 TiDB 使用环境】
【概述】场景+问题概述
【背景】做过哪些操作
【现象】业务和数据库现象
【业务影响】
【TiDB 版本】tidb v5.0.5
【附件】

tikv 异常重启 日志如下
[2021/12/10 18:00:47.158 +08:00] [FATAL] [lib.rs:464] [“rocksdb background error. db: kv, reason: compaction, error: IO error: While sync_file_range returned -1: /data//tidb-data/tikv-21162/db/3946046.sst: Structure needs cleaning”] [backtrace=“stack backtrace:\ 0: tikv_util::set_panic_hook::{{closure}}\ at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/tikv_util/src/lib.rs:463\ 1: std::panicking::rust_panic_with_hook\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:595\ 2: std::panicking::begin_panic_handler::{{closure}}\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:497\ 3: std::sys_common::backtrace::__rust_end_short_backtrace\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/sys_common/backtrace.rs:141\ 4: rust_begin_unwind\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:493\ 5: std::panicking::begin_panic_fmt\ at /rustc/bc39d4d9c514e5fdb40a5782e6ca08924f979c35//library/std/src/panicking.rs:435\ 6: <engine_rocks::event_listener::RocksEventListener as rocksdb::event_listener::EventListener>::on_background_error\ at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tikv/components/engine_rocks/src/event_listener.rs:93\ 7: rocksdb::event_listener::on_background_error\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/src/event_listener.rs:332\ 8: _ZN24crocksdb_eventlistener_t17OnBackgroundErrorEN7rocksdb21BackgroundErrorReasonEPNS0_6StatusE\ at crocksdb/c.cc:2330\ 9: _ZN7rocksdb12EventHelpers23NotifyOnBackgroundErrorERKSt6vectorISt10shared_ptrINS_13EventListenerEESaIS4_EENS_21BackgroundErrorReasonEPNS_6StatusEPNS_17InstrumentedMutexEPb\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/librocksdb_sys/rocksdb/db/event_helpers.cc:53\ 10: _ZN7rocksdb12ErrorHandler10SetBGErrorERKNS_6StatusENS_21BackgroundErrorReasonE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/librocksdb_sys/rocksdb/db/error_handler.cc:219\ 11: _ZN7rocksdb6DBImpl20BackgroundCompactionEPbPNS_10JobContextEPNS_9LogBufferEPNS0_19PrepickedCompactionENS_3Env8PriorityE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2797\ 12: _ZN7rocksdb6DBImpl24BackgroundCallCompactionEPNS0_19PrepickedCompactionENS_3Env8PriorityE\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2317\ 13: _ZN7rocksdb6DBImpl16BGWorkCompactionEPv\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/librocksdb_sys/rocksdb/db/db_impl/db_impl_compaction_flush.cc:2091\ 14: _ZNKSt8functionIFvvEEclEv\ at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:687\ _ZN7rocksdb14ThreadPoolImpl4Impl8BGThreadEm\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/librocksdb_sys/rocksdb/util/threadpool_imp.cc:266\ 15: _ZN7rocksdb14ThreadPoolImpl4Impl15BGThreadWrapperEPv\ at /rust/git/checkouts/rust-rocksdb-a9a28e74c6ead8ef/53ff7e7/librocksdb_sys/rocksdb/util/threadpool_imp.cc:307\ 16: execute_native_thread_routine\ 17: start_thread\ 18: __clone\ ”] [location=components/engine_rocks/src/event_listener.rs:93] [thread_name=]
[2021/12/10 18:01:03.245 +08:00] [INFO] [lib.rs:90] [“Welcome to TiKV”]

  1. TiUP Cluster Display 信息

  2. TiUP Cluster Edit Config 信息

  3. TiDB- Overview 监控

  • 对应模块日志(包含问题前后1小时日志)
2 个赞

这个 panic 是在 rocksdb 读写过程中遇到 IO 错误时,tikv 自我保护而自行停机。错误信息 “Structure needs cleaning” 说明文件系统极有可能已经损坏,建议进行停机检查。

1 个赞

从日志信息看是进行compaction时,检查到结构被破坏了。

估计message应该有相关错误信息

单节点副本丢失,可以参考下

如果发现多节点副本丢失了,那就只能有损恢复了
https://docs.pingcap.com/zh/tidb/stable/tikv-control#强制-region-从多副本失败状态恢复服务慎用

2 个赞

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。