ALTER RANGE 修改全局放置规则后 TiFlash Compute 节点启动报错

在扩容一个 TiKV 节点,并使用 ALTER RANGE修改全局放置策略后,tiflash 直接报错了,报错内容如下

[2024/12/28 12:52:05.786 +08:00] [ERROR] [BaseDaemon.cpp:370] [########################################] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:05.788 +08:00] [ERROR] [BaseDaemon.cpp:371] ["(from thread 42) Received signal Segmentation fault(11)."] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:05.788 +08:00] [ERROR] [BaseDaemon.cpp:401] ["Address: 0x10"] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:05.788 +08:00] [ERROR] [BaseDaemon.cpp:407] ["Access: read."] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:05.788 +08:00] [ERROR] [BaseDaemon.cpp:416] ["Address not mapped to object."] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:05.788 +08:00] [ERROR] [BaseDaemon.cpp:563] ["\n       0x7772a31\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+125250097]\n                \tlibs/libdaemon/src/BaseDaemon.cpp:214\n  0x7f65aff03520\t<unknown symbol> [libc.so.6+271648]\n  0x7f65aff58ef4\t__pthread_mutex_lock [libc.so.6+622324]\n  0x7f65b4af0f46\tstd::__1::mutex::lock() [libc++.so.1+421702]\n  0x7f65b4af192a\tstd::__1::__shared_mutex_base::lock_shared() [libc++.so.1+424234]\n       0x1f9eaa6\tDB::TiDBSchemaSyncerManager::getOrCreateSchemaSyncer(unsigned int) [tiflash+33155750]\n                \tdbms/src/TiDB/Schema/TiDBSchemaManager.h:120\n       0x8b1d78b\tDB::AtomicGetStorageSchema(std::__1::shared_ptr<DB::Region> const&, DB::TMTContext&) [tiflash+145872779]\n                \tdbms/src/Storages/KVStore/Decode/PartitionStreams.cpp:473\n       0x8a9e243\tDB::KVStore::preHandleSSTsToDTFiles(std::__1::shared_ptr<DB::Region>, DB::SSTViewVec, unsigned long, unsigned long, DB::DM::FileConvertJobType, DB::TMTContext&) [tiflash+145351235]\n                \tdbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp:588\n       0x8a9d625\tDB::KVStore::preHandleSnapshotToFiles(std::__1::shared_ptr<DB::Region>, DB::SSTViewVec, unsigned long, unsigned long, std::__1::optional<unsigned long>, DB::TMTContext&) [tiflash+145348133]\n                \tdbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp:206\n       0x8a80843\tPreHandleSnapshot [tiflash+145229891]\n                \tdbms/src/Storages/KVStore/FFI/ProxyFFI.cpp:672\n  0x7f65b1ba571c\tproxy_ffi::engine_store_helper_impls::_$LT$impl$u20$proxy_ffi..interfaces..root..DB..EngineStoreServerHelper$GT$::pre_handle_snapshot::h791b034f987b027e [libtiflash_proxy.so+26081052]\n  0x7f65b16c9f2e\tengine_store_ffi::core::forward_raft::snapshot::pre_handle_snapshot_impl::hc02c06315952cb47 [libtiflash_proxy.so+20987694]\n  0x7f65b22b41f6\tyatp::task::future::RawTask$LT$F$GT$::poll::h1156ccd37a9b2f70 [libtiflash_proxy.so+33481206]\n  0x7f65b3e2143b\t_$LT$yatp..task..future..Runner$u20$as$u20$yatp..pool..runner..Runner$GT$::handle::h879bddc8d67b170f [libtiflash_proxy.so+62239803]\n  0x7f65b3e11e8c\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h17fc1674134e9d3e [libtiflash_proxy.so+62176908]\n  0x7f65b3e1290c\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hc086da2e6c76c260 [libtiflash_proxy.so+62179596]\n  0x7f65b350d015\tstd::sys::unix::thread::Thread::new::thread_start::hd2791a9cabec1fda [libtiflash_proxy.so+52719637]\n                \t/rustc/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/std/src/sys/unix/thread.rs:108\n  0x7f65aff55ac3\t<unknown symbol> [libc.so.6+608963]"] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:25.517 +08:00] [WARN] [S3Common.cpp:145] ["tag=ClientConfiguration message=User specified profile: [] is not found, will use the SDK resolved one."] [source=AWSClient] [thread_id=1]
[2024/12/28 12:52:28.495 +08:00] [ERROR] [BaseDaemon.cpp:370] [########################################] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:28.495 +08:00] [ERROR] [BaseDaemon.cpp:371] ["(from thread 40) Received signal Segmentation fault(11)."] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:28.495 +08:00] [ERROR] [BaseDaemon.cpp:401] ["Address: 0x10"] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:28.495 +08:00] [ERROR] [BaseDaemon.cpp:407] ["Access: read."] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:28.495 +08:00] [ERROR] [BaseDaemon.cpp:416] ["Address not mapped to object."] [source=BaseDaemon] [thread_id=41]
[2024/12/28 12:52:31.051 +08:00] [FATAL] [Exception.cpp:106] ["Code: 49, e.displayText() = DB::Exception: Illegal region range, should not happen, start_key=748000000000000CFF535F698000000000FF0000020130633162FF30383137FF2D3437FF62392D3436FF3130FF2D613135612DFF32FF62303163633065FFFF3364333000000000FFFB01363232363163FF3638FF2D35323961FF2D3431FF39622D39FF6434622DFF336134FF3633333239FF3565FF326300000000FB00FE end_key=748000000000000CFF7B5F728000000000FF0007310000000000FA, e.what() = DB::Exception, Stack trace:\n\n\n       0x8aed691\tDB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&) [tiflash+145675921]\n                \tdbms/src/Common/Exception.h:53\n       0x8aed2a5\tDB::RegionRangeKeys::RegionRangeKeys(DB::StringObject<true>&&, DB::StringObject<true>&&) [tiflash+145674917]\n                \tdbms/src/Storages/KVStore/MultiRaft/RegionState.cpp:149\n       0x8aec7c1\tDB::RegionState::updateRegionRange() [tiflash+145672129]\n                \tdbms/src/Storages/KVStore/MultiRaft/RegionState.cpp:83\n       0x8ade8b2\tDB::RegionMeta::RegionMeta(metapb::Peer, metapb::Region, raft_serverpb::RaftApplyState) [tiflash+145615026]\n                \tdbms/src/Storages/KVStore/MultiRaft/RegionMeta.cpp:489\n       0x8a4ab97\tDB::KVStore::genRegionPtr(metapb::Region&&, unsigned long, unsigned long, unsigned long) [tiflash+145009559]\n                \tdbms/src/Storages/KVStore/KVStore.cpp:685\n       0x8a807d5\tPreHandleSnapshot [tiflash+145229781]\n                \tdbms/src/Storages/KVStore/FFI/ProxyFFI.cpp:660\n  0x7f971619871c\tproxy_ffi::engine_store_helper_impls::_$LT$impl$u20$proxy_ffi..interfaces..root..DB..EngineStoreServerHelper$GT$::pre_handle_snapshot::h791b034f987b027e [libtiflash_proxy.so+26081052]\n  0x7f9715cbcf2e\tengine_store_ffi::core::forward_raft::snapshot::pre_handle_snapshot_impl::hc02c06315952cb47 [libtiflash_proxy.so+20987694]\n  0x7f97168a71f6\tyatp::task::future::RawTask$LT$F$GT$::poll::h1156ccd37a9b2f70 [libtiflash_proxy.so+33481206]\n  0x7f971841443b\t_$LT$yatp..task..future..Runner$u20$as$u20$yatp..pool..runner..Runner$GT$::handle::h879bddc8d67b170f [libtiflash_proxy.so+62239803]\n  0x7f9718404e8c\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h17fc1674134e9d3e [libtiflash_proxy.so+62176908]\n  0x7f971840590c\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hc086da2e6c76c260 [libtiflash_proxy.so+62179596]\n  0x7f9717b00015\tstd::sys::unix::thread::Thread::new::thread_start::hd2791a9cabec1fda [libtiflash_proxy.so+52719637]\n                \t/rustc/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/std/src/sys/unix/thread.rs:108\n  0x7f9714548ac3\t<unknown symbol> [libc.so.6+608963]\n  0x7f97145da850\t<unknown symbol> [libc.so.6+1206352]"] [source="DB::RawCppPtr DB::PreHandleSnapshot(DB::EngineStoreServerWrap *, DB::BaseBuffView, uint64_t, DB::SSTViewVec, uint64_t, uint64_t)"] [thread_id=44]
[2024/12/28 12:52:31.051 +08:00] [FATAL] [Exception.cpp:106] ["Code: 49, e.displayText() = DB::Exception: Illegal region range, should not happen, start_key=7480000000000137FF1A5F698000000000FF000002010FA70F82FF02211002FF0FC00EFF330E6D0E8BFF0E8BFF0E310E2D1044FF0EFF600E2F10510E6DFFFF0F2E0E290E300E2FFFFF00000000000000FF00F7010E2A0E290EFF2D0E32FF00000000FF00000000F7010E2BFF0E310E2B0E29FF0EFF2C0E2F0E320000FDFF0419B53600000000FF00010E2B0E290E2BFF0E2DFF0E2A0E2B0EFF2B0E30FF0E290E2BFF0E2F0E31FF0E2A0EFF2F0E290E2EFF0000FF000000000000F700FE end_key=7480000000000137FF1A5F698000000000FF000002010FB40F5BFF0F100FEAFF021B0EFF33101F1002FF0F82FF021B0E300E33FF0EFF300E310E310E8BFFFF0E2F0E3202210E2CFFFF0E2B0E320E3102FF21FF0E2D0E2C0E2AFF0E2AFF02210E310EFF2B0E4AFF0E2B0221FF0E310E32FF000000FF0000000000F7010EFF290E290E290E31FFFF0000000000000000FFF7010E2F0E2D0E2DFF0E2AFF0E310E3000FF000000FB0419B512FF0000000000010E2BFF0E290E2B0E2DFF0EFF2A0E2B0E290E32FFFF0E290E2E0E2B0E29FFFF0E2F0E2E0E2B0EFF32FF000000000000FF0000F70000000000FA, e.what() = DB::Exception, Stack trace:\n\n\n       0x8aed691\tDB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&&) [tiflash+145675921]\n                \tdbms/src/Common/Exception.h:53\n       0x8aed2a5\tDB::RegionRangeKeys::RegionRangeKeys(DB::StringObject<true>&&, DB::StringObject<true>&&) [tiflash+145674917]\n                \tdbms/src/Storages/KVStore/MultiRaft/RegionState.cpp:149\n       0x8aec7c1\tDB::RegionState::updateRegionRange() [tiflash+145672129]\n                \tdbms/src/Storages/KVStore/MultiRaft/RegionState.cpp:83\n       0x8ade8b2\tDB::RegionMeta::RegionMeta(metapb::Peer, metapb::Region, raft_serverpb::RaftApplyState) [tiflash+145615026]\n                \tdbms/src/Storages/KVStore/MultiRaft/RegionMeta.cpp:489\n       0x8a4ab97\tDB::KVStore::genRegionPtr(metapb::Region&&, unsigned long, unsigned long, unsigned long) [tiflash+145009559]\n                \tdbms/src/Storages/KVStore/KVStore.cpp:685\n       0x8a807d5\tPreHandleSnapshot [tiflash+145229781]\n                \tdbms/src/Storages/KVStore/FFI/ProxyFFI.cpp:660\n  0x7f971619871c\tproxy_ffi::engine_store_helper_impls::_$LT$impl$u20$proxy_ffi..interfaces..root..DB..EngineStoreServerHelper$GT$::pre_handle_snapshot::h791b034f987b027e [libtiflash_proxy.so+26081052]\n  0x7f9715cbcf2e\tengine_store_ffi::core::forward_raft::snapshot::pre_handle_snapshot_impl::hc02c06315952cb47 [libtiflash_proxy.so+20987694]\n  0x7f97168a71f6\tyatp::task::future::RawTask$LT$F$GT$::poll::h1156ccd37a9b2f70 [libtiflash_proxy.so+33481206]\n  0x7f971841443b\t_$LT$yatp..task..future..Runner$u20$as$u20$yatp..pool..runner..Runner$GT$::handle::h879bddc8d67b170f [libtiflash_proxy.so+62239803]\n  0x7f9718404e8c\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h17fc1674134e9d3e [libtiflash_proxy.so+62176908]\n  0x7f971840590c\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hc086da2e6c76c260 [libtiflash_proxy.so+62179596]\n  0x7f9717b00015\tstd::sys::unix::thread::Thread::new::thread_start::hd2791a9cabec1fda [libtiflash_proxy.so+52719637]\n                \t/rustc/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/std/src/sys/unix/thread.rs:108\n  0x7f9714548ac3\t<unknown symbol> [libc.so.6+608963]\n  0x7f97145da850\t<unknown symbol> [libc.so.6+1206352]"] [source="DB::RawCppPtr DB::PreHandleSnapshot(DB::EngineStoreServerWrap *, DB::BaseBuffView, uint64_t, DB::SSTViewVec, uint64_t, uint64_t)"] [thread_id=45]
[2024/12/28 12:52:31.051 +08:00] [ERROR] [BaseDaemon.cpp:563] ["\n       0x7772a31\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+125250097]\n                \tlibs/libdaemon/src/BaseDaemon.cpp:214\n  0x7f97144f6520\t<unknown symbol> [libc.so.6+271648]\n  0x7f971454bef4\t__pthread_mutex_lock [libc.so.6+622324]\n  0x7f97190e3f46\tstd::__1::mutex::lock() [libc++.so.1+421702]\n  0x7f97190e492a\tstd::__1::__shared_mutex_base::lock_shared() [libc++.so.1+424234]\n       0x1f9eaa6\tDB::TiDBSchemaSyncerManager::getOrCreateSchemaSyncer(unsigned int) [tiflash+33155750]\n                \tdbms/src/TiDB/Schema/TiDBSchemaManager.h:120\n       0x8b1d78b\tDB::AtomicGetStorageSchema(std::__1::shared_ptr<DB::Region> const&, DB::TMTContext&) [tiflash+145872779]\n                \tdbms/src/Storages/KVStore/Decode/PartitionStreams.cpp:473\n       0x8a9e243\tDB::KVStore::preHandleSSTsToDTFiles(std::__1::shared_ptr<DB::Region>, DB::SSTViewVec, unsigned long, unsigned long, DB::DM::FileConvertJobType, DB::TMTContext&) [tiflash+145351235]\n                \tdbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp:588\n       0x8a9d625\tDB::KVStore::preHandleSnapshotToFiles(std::__1::shared_ptr<DB::Region>, DB::SSTViewVec, unsigned long, unsigned long, std::__1::optional<unsigned long>, DB::TMTContext&) [tiflash+145348133]\n                \tdbms/src/Storages/KVStore/MultiRaft/PrehandleSnapshot.cpp:206\n       0x8a80843\tPreHandleSnapshot [tiflash+145229891]\n                \tdbms/src/Storages/KVStore/FFI/ProxyFFI.cpp:672\n  0x7f971619871c\tproxy_ffi::engine_store_helper_impls::_$LT$impl$u20$proxy_ffi..interfaces..root..DB..EngineStoreServerHelper$GT$::pre_handle_snapshot::h791b034f987b027e [libtiflash_proxy.so+26081052]\n  0x7f9715cbcf2e\tengine_store_ffi::core::forward_raft::snapshot::pre_handle_snapshot_impl::hc02c06315952cb47 [libtiflash_proxy.so+20987694]\n  0x7f97168a71f6\tyatp::task::future::RawTask$LT$F$GT$::poll::h1156ccd37a9b2f70 [libtiflash_proxy.so+33481206]\n  0x7f971841443b\t_$LT$yatp..task..future..Runner$u20$as$u20$yatp..pool..runner..Runner$GT$::handle::h879bddc8d67b170f [libtiflash_proxy.so+62239803]\n  0x7f9718404e8c\tstd::sys_common::backtrace::__rust_begin_short_backtrace::h17fc1674134e9d3e [libtiflash_proxy.so+62176908]\n  0x7f971840590c\tcore::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hc086da2e6c76c260 [libtiflash_proxy.so+62179596]\n  0x7f9717b00015\tstd::sys::unix::thread::Thread::new::thread_start::hd2791a9cabec1fda [libtiflash_proxy.so+52719637]\n                \t/rustc/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/std/src/sys/unix/thread.rs:108\n  0x7f9714548ac3\t<unknown symbol> [libc.so.6+608963]"] [source=BaseDaemon] [thread_id=41]

报错的内容是: Illegal region range, should not happen, start_key=7480000000000137FF1A5F698000000000FF000002010FA70F82FF02211002FF0FC00EFF330E6D0E8BFF0E8BFF0E310E2D1044FF0EFF600E2F10510E6DFFFF0F2E0E290E300E2FFFFF00000000000000FF00F7010E2A0E290EFF2D0E32FF00000000FF00000000F7010E2BFF0E310E2B0E29FF0EFF2C0E2F0E320000FDFF0419B53600000000FF00010E2B0E290E2BFF0E2DFF0E2A0E2B0EFF2B0E30FF0E290E2BFF0E2F0E31FF0E2A0EFF2F0E290E2EFF0000FF000000000000F700FE end_key=7480000000000137FF1A5F698000000000FF000002010FB40F5BFF0F100FEAFF021B0EFF33101F1002FF0F82FF021B0E300E33FF0EFF300E310E310E8BFFFF0E2F0E3202210E2CFFFF0E2B0E320E3102FF21FF0E2D0E2C0E2AFF0E2AFF02210E310EFF2B0E4AFF0E2B0221FF0E310E32FF000000FF0000000000F7010EFF290E290E290E31FFFF0000000000000000FFF7010E2F0E2D0E2DFF0E2AFF0E310E3000FF000000FB0419B512FF0000000000010E2BFF0E290E2B0E2DFF0EFF2A0E2B0E290E32FFFF0E290E2E0E2B0E29FFFF0E2F0E2E0E2B0EFF32FF000000000000FF0000F70000000000FA

» region key 7480000000000137FF1A5F698000000000FF000002010FA70F82FF02211002FF0FC00EFF330E6D0E8BFF0E8BFF0E310E2D1044FF0EFF600E2F10510E6DFFFF0F2E0E290E300E2FFFFF00000000000000FF00F7010E2A0E290EFF2D0E32FF00000000FF00000000F7010E2BFF0E310E2B0E29FF0EFF2C0E2F0E320000FDFF0419B53600000000FF00010E2B0E290E2BFF0E2DFF0E2A0E2B0EFF2B0E30FF0E290E2BFF0E2F0E31FF0E2A0EFF2F0E290E2EFF0000FF000000000000F700FE
{
  "id": 22230341614,
  "start_key": "7480000000000137FF1A5F698000000000FF000002010FA70F82FF02211002FF0FC00EFF330E6D0E8BFF0E8BFF0E310E2D1044FF0EFF600E2F10510E6DFFFF0F2E0E290E300E2FFFFF00000000000000FF00F7010E2A0E290EFF2D0E32FF00000000FF00000000F7010E2BFF0E310E2B0E29FF0EFF2C0E2F0E320000FDFF0419B53600000000FF00010E2B0E290E2BFF0E2DFF0E2A0E2B0EFF2B0E30FF0E290E2BFF0E2F0E31FF0E2A0EFF2F0E290E2EFF0000FF000000000000F700FE",
  "end_key": "7480000000000137FF1A5F698000000000FF000002010FB40F5BFF0F100FEAFF021B0EFF33101F1002FF0F82FF021B0E300E33FF0EFF300E310E310E8BFFFF0E2F0E3202210E2CFFFF0E2B0E320E3102FF21FF0E2D0E2C0E2AFF0E2AFF02210E310EFF2B0E4AFF0E2B0221FF0E310E32FF000000FF0000000000F7010EFF290E290E290E31FFFF0000000000000000FFF7010E2F0E2D0E2DFF0E2AFF0E310E3000FF000000FB0419B512FF0000000000010E2BFF0E290E2B0E2DFF0EFF2A0E2B0E290E32FFFF0E290E2E0E2B0E29FFFF0E2F0E2E0E2B0EFF32FF000000000000FF0000F70000000000FA",
  "epoch": {
    "conf_ver": 5795,
    "version": 72168
  },
  "peers": [
    {
      "role_name": "Voter",
      "id": 22230341615,
      "store_id": 10213540256
    },
    {
      "role_name": "Voter",
      "id": 22639190811,
      "store_id": 10213540721
    },
    {
      "role_name": "Voter",
      "id": 22640776832,
      "store_id": 8
    }
  ],
  "leader": {
    "role_name": "Voter",
    "id": 22639190811,
    "store_id": 10213540721
  },
  "cpu_usage": 0,
  "written_bytes": 14855,
  "read_bytes": 0,
  "written_keys": 40,
  "read_keys": 0,
  "approximate_size": 105,
  "approximate_keys": 382494
}
1 个赞

mok 解码可以看到这是个存 table_id=79642 的 index 数据的 Region 被意外调度到 tiflash,tiflash 无法处理报错了。

>  mok 7480000000000137FF1A5F698000000000FF000002010FA70F82FF02211002FF0FC00EFF330E6D0E8BFF0E8BFF0E310E2D1044FF0EFF600E2F10510E6DFFFF0F2E0E290E300E2FFFFF00000000000000FF00F7010E2A0E290EFF2D0E32FF00000000FF00000000F7010E2BFF0E310E2B0E29FF0EFF2C0E2F0E320000FDFF0419B53600000000FF00010E2B0E290E2BFF0E2DFF0E2A0E2B0EFF2B0E30FF0E290E2BFF0E2F0E31FF0E2A0EFF2F0E290E2EFF0000FF000000000000F700FE
"7480000000000137FF1A5F698000000000FF000002010FA70F82FF02211002FF0FC00EFF330E6D0E8BFF0E8BFF0E310E2D1044FF0EFF600E2F10510E6DFFFF0F2E0E290E300E2FFFFF00000000000000FF00F7010E2A0E290EFF2D0E32FF00000000FF00000000F7010E2BFF0E310E2B0E29FF0EFF2C0E2F0E320000FDFF0419B53600000000FF00010E2B0E290E2BFF0E2DFF0E2A0E2B0EFF2B0E30FF0E290E2BFF0E2F0E31FF0E2A0EFF2F0E290E2EFF0000FF000000000000F700FE"
└─## decode hex key
  └─"t\200\000\000\000\000\0017\377\032_i\200\000\000\000\000\377\000\000\002\001\017\247\017\202\377\002!\020\002\377\017\300\016\3773\016m\016\213\377\016\213\377\0161\016-\020D\377\016\377`\016/\020Q\016m\377\377\017.\016)\0160\016/\377\377\000\000\000\000\000\000\000\377\000\367\001\016*\016)\016\377-\0162\377\000\000\000\000\377\000\000\000\000\367\001\016+\377\0161\016+\016)\377\016\377,\016/\0162\000\000\375\377\004\031\2656\000\000\000\000\377\000\001\016+\016)\016+\377\016-\377\016*\016+\016\377+\0160\377\016)\016+\377\016/\0161\377\016*\016\377/\016)\016.\377\000\000\377\000\000\000\000\000\000\367\000\376"
    ├─## decode mvcc key
    │ └─"t\200\000\000\000\000\0017\032_i\200\000\000\000\000\000\000\002\001\017\247\017\202\002!\020\002\377\017\300\0163\016m\016\213\377\016\213\0161\016-\020D\377\016`\016/\020Q\016m\377\017.\016)\0160\016/\377\000\000\000\000\000\000\000\000\367\001\016*\016)\016-\0162\377\000\000\000\000\000\000\000\000\367\001\016+\0161\016+\016)\377\016,\016/\0162\000\000\375\004\031\2656\000\000\000\000\000\001\016+\016)\016+\016-\377\016*\016+\016+\0160\377\016)\016+\016/\0161\377\016*\016/\016)\016.\377\000\000\000\000\000\000\000\000\367"
    │   ├─## table prefix
    │   │ └─table: 79642
    │   └─## table index key
    │     ├─table: 79642
    │     ├─index: 2
    │     └─"\001\017\247\017\202\002!\020\002\377\017\300\0163\016m\016\213\377\016\213\0161\016-\020D\377\016`\016/\020Q\016m\377\017.\016)\0160\016/\377\000\000\000\000\000\000\000\000\367\001\016*\016)\016-\0162\377\000\000\000\000\000\000\000\000\367\001\016+\0161\016+\016)\377\016,\016/\0162\000\000\375\004\031\2656\000\000\000\000\000\001\016+\016)\016+\016-\377\016*\016+\016+\0160\377\016)\016+\016/\0161\377\016*\016/\016)\016.\377\000\000\000\000\000\000\000\000\367"
    │       └─## decode index values
    │         ├─kind: Bytes, value: ��!�3m��1-D`/Qm.)0/
    │         ├─kind: Bytes, value: *)-2
    │         ├─kind: Bytes, value: +1+),/2
    │         ├─kind: Uint64, value: 1852446195360727040
    │         └─kind: Bytes, value: +)+-*++0)+/1*/).
    └─## table prefix
      └─table: 79871
1 个赞

可以具体给出下你执行了什么样的 CREATE PLACEMENT POLICY 以及 ALTER RANGE 语句么?

是修改了 evict_sata_dw 的SURVIVAL_PREFERENCES,从"[zone, dc, host]" 改成了 "[host]"

ALTER PLACEMENT POLICY evict_sata_dw CONSTRAINTS="[-disk=sata-new, -disk=dw-ssd]" SURVIVAL_PREFERENCES="[host]";
ALTER RANGE global PLACEMENT POLICY evict_sata_dw;
1 个赞

如果只是 tiflash compute 节点出现这个错误,tiflash write节点没事的话。把 global placement policy 改回默认的,然后 tiflash compute节点全部下线清空缓存数据、重新部署试试。

可能是 ALTER RANGE GLOBAL 这个功能跟 tiflash 存算分离之间有兼容性问题没处理好,导致 Region 被意外调度到 tiflash compute 节点了。

2 个赞

试了下,还是一样报错。。

得等到副本都均衡完毕,再缩容扩容,之后就可以了。 均衡花费了3个小时 :mask:

1 个赞

请问,这个后续有记录issue来跟进么?我测试环境再试下,看是否能简单复现一下

https://github.com/pingcap/tiflash/issues/9750
我建了个 issue,复现了一下,在看如何解这个问题中。issue 里面有我的复现步骤。

确认是 placement-rule-in-SQL 生成的 rule,可能会导致 region peer 被意外地调度到存算分离下tiflash 的 compute node,进而引起的问题。

作为临时 workaround,可以在 CONSTRAINTS 里面显式地排除掉 tiflash 以及 tiflash 计算节点作为调度的 store。类似下面这种形式

CONSTRAINTS="[-engine=tiflash, -engine=tiflash_compute]"

tidb 层面后续的修复可以关注这个 issue:TiFlash compute node crashes after executing `ALTER RANGE` in TiDB · Issue #9750 · pingcap/tiflash · GitHub

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。