tikv,sst文件损坏无法启动,扩容后,缩容这一台坏的,一直处于Pending Offline

【 TiDB 使用环境】生产环境
753
【复现路径】做过哪些操作出现的问题
压测导致tikv宕机
【遇到的问题:问题现象及影响】
查询磁盘文件报错/data1/tidb/tidb-data# du -sh *
0 LOCK
du: cannot access ‘db/028696.sst’: Structure needs cleaning
12K import
24K last_tikv.toml
11G raft-engine
0 raftdb.info
162M rocksdb.info
36K snap
8.7G space_placeholder_file
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
日志也是一堆报错 [“Invalid Iterator: Error(Request(message: "Engine Engine(Status { code: IoError, sub_code: None, sev: NoError, state: \"Corruption: block checksum mismatch: stored = 2427634033, computed = 3414811807, type = 1 in /data1/tidb/tidb-data/db/080037.sst offset 4477725 size 4864\" })"))”] [backtrace=" 0: tikv_util::set_panic_hook::{{closure}}\n at /workspace/source/tikv/components/tikv_util/src/lib.rs:509:18\n 1: <alloc::boxed::Box<F,A> as core::ops::function::Fn>::call\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2032:9\n std::panicking::rust_panic_with_hook\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:692:13\n 2: std::panicking::begin_panic_handler::{{closure}}\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:579:13\n 3: std::sys_common::backtrace::__rust_end_short_backtrace\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:137:18\n 4: rust_begin_unwind\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:575:5\n 5: core::panicking::panic_fmt\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:65:14\n 6: core::result::unwrap_failed\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1791:5\n 7: core::result::Result<T,E>::expect\n at /root/.rustup/toolchains/nightly-2022-11-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1070:23\n tikv_kv::cursor::Cursor::next\n at /workspace/source/tikv/components/tikv_kv/src/cursor.rs:370:9\n 8: tikv::storage::mvcc::reader::scanner::forward::Cursors::move_write_cursor_to_next_user_key\n at /workspace/source/tikv/src/storage/mvcc/reader/scanner/forward.rs:80:17\n 9: <tikv::storage::mvcc::reader::scanner::forward::LatestKvPolicy as tikv::storage::mvcc::reader::scanner::forward::ScanPolicy>::handle_write\n at /workspace/source/tikv/src/storage/mvcc/reader/scanner/forward.rs:505:9\n tikv::storage::mvcc::reader::scanner::forward::ForwardScanner<S,P>::read_next\n at /workspace/source/tikv/src/storage/mvcc/reader/scanner/forward.rs:290:56\n <tikv::storage::mvcc::reader::scanner::Scanner as tikv::storage::txn::store::Scanner>::next\n at /workspace/source/tikv/src/storage/mvcc/reader/scanner/mod.rs:229:45\n 10: <tikv::coprocessor::dag::storage_impl::TikvStorage as tidb_query_common::storage::Storage>::scan_next\n

你这个sst坏了
销毁这个节点重建吧。

用工具检查,看看磁盘问题?

已经扩了一台,然后缩容这台了,启动不了,现在缩容一直处于Pending Offline

啥工具啊

MegaCli64

正常状态主要观察这个阶段上的 leader/region是不是一直在减少(pd-ctl store xx 或information_schema.tikv_store_status),如果你只有3个tikv 那没法满足下线条件,得先扩容一个

已经先扩容了,不然他不会让你缩容的

如果扩容的节点已经正常,现在已经有了三个节点的话,可以–force强制缩容

这强制缩容不会丢数据吧

你不是已经扩容过了,有3个正常节点的话,这个异常节点,直接强制缩容就行了,不会丢数据

只坏这一台的话,没事儿,慢慢等着,过一段时间就tombstone了。

SST坏了,这台应该就废了啊,虽然不一定导致整体不可用,但这台本身还能自动修复吗?可以详细介绍一下吗?

好像不能修复,只能扩容,在缩容掉坏的