TiFlash 同步数据卡住,日志只报了"Modifications in meta haven't persisted" 错

【 TiDB 使用环境】生产环境
【 TiDB 版本】7.1.0
【复现路径】无
【遇到的问题:问题现象及影响】
tiflash同步大量数据,卡住,连续1天基本没变化
一天前

当前

磁盘用量4块1.6T做的radi0


4块盘也有io写入

【附件:截图/日志/监控】
错误日志是偶发报错,总共出现2次

[2024/09/07 04:28:02.471 +08:00] [ERROR] [WriteBatchesImpl.h:72] ["!!!=========================Modifications in meta haven't persisted=========================!!! Stack trace: \n       0x735d82c\tDB::DM::WriteBatches::~WriteBatches()::'lambda'(DB::WriteBatchWrapper const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::operator()(DB::WriteBatchWrapper const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const [tiflash+120969260]\n                \tdbms/src/Storages/DeltaMerge/WriteBatchesImpl.h:68\n       0x734bbe8\tDB::DM::WriteBatches::~WriteBatches() [tiflash+120896488]\n                \tdbms/src/Storages/DeltaMerge/WriteBatchesImpl.h:77\n       0x738498e\tDB::DM::DeltaMergeStore::segmentMerge(DB::DM::DMContext&, std::__1::vector<std::__1::shared_ptr<DB::DM::Segment>, std::__1::allocator<std::__1::shared_ptr<DB::DM::Segment> > > const&, DB::DM::DeltaMergeStore::SegmentMergeReason) [tiflash+121129358]\n                \tdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalSegment.cpp:344\n       0x737812c\tDB::DM::DeltaMergeStore::onSyncGc(long, DB::DM::GCOptions const&) [tiflash+121078060]\n                \tdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalBg.cpp:889\n       0x8001376\tDB::GCManager::work() [tiflash+134222710]\n                \tdbms/src/Storages/GCManager.cpp:105\n       0x7e20bab\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DB::BackgroundProcessingPool::BackgroundProcessingPool(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)::$_1> >(void*) [tiflash+132254635]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7fa43a8fbea5\tstart_thread [libpthread.so.0+32421]\n  0x7fa43a20ab0d\tclone [libc.so.6+1043213]"] [thread_id=19]
[2024/09/07 12:40:12.862 +08:00] [ERROR] [WriteBatchesImpl.h:72] ["!!!=========================Modifications in meta haven't persisted=========================!!! Stack trace: \n       0x735d82c\tDB::DM::WriteBatches::~WriteBatches()::'lambda'(DB::WriteBatchWrapper const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::operator()(DB::WriteBatchWrapper const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const [tiflash+120969260]\n                \tdbms/src/Storages/DeltaMerge/WriteBatchesImpl.h:68\n       0x734bbe8\tDB::DM::WriteBatches::~WriteBatches() [tiflash+120896488]\n                \tdbms/src/Storages/DeltaMerge/WriteBatchesImpl.h:77\n       0x738498e\tDB::DM::DeltaMergeStore::segmentMerge(DB::DM::DMContext&, std::__1::vector<std::__1::shared_ptr<DB::DM::Segment>, std::__1::allocator<std::__1::shared_ptr<DB::DM::Segment> > > const&, DB::DM::DeltaMergeStore::SegmentMergeReason) [tiflash+121129358]\n                \tdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalSegment.cpp:344\n       0x737812c\tDB::DM::DeltaMergeStore::onSyncGc(long, DB::DM::GCOptions const&) [tiflash+121078060]\n                \tdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalBg.cpp:889\n       0x8001376\tDB::GCManager::work() [tiflash+134222710]\n                \tdbms/src/Storages/GCManager.cpp:105\n       0x7e20bab\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DB::BackgroundProcessingPool::BackgroundProcessingPool(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)::$_1> >(void*) [tiflash+132254635]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7fa43a8fbea5\tstart_thread [libpthread.so.0+32421]\n  0x7fa43a20ab0d\tclone [libc.so.6+1043213]"] [thread_id=10]

「重要发现」
新建测试表,发现PROGRESS=1,AVAILABLE=0,这不符合预期。

PD节点日志发现timeout异常

[2024/09/08 07:00:28.910 +08:00] [INFO] [rule_manager.go:281] ["placement rule updated"] [rule="{\"group_id\":\"tiflash\",\"id\":\"table-24956-r\",\"index\"
:120,\"start_key\":\"7480000000000061ff7c5f720000000000fa\",\"end_key\":\"7480000000000061ff7d00000000000000f8\",\"role\":\"learner\",\"is_witness\":false,\
"count\":1,\"label_constraints\":[{\"key\":\"engine\",\"op\":\"in\",\"values\":[\"tiflash\"]}],\"create_timestamp\":1725750028}"]
[2024/09/08 07:00:28.989 +08:00] [INFO] [operator_controller.go:443] ["add operator"] [region-id=1696322398] [operator="\"rule-split-region {split: region
 1696322398 use policy USEKEY and keys [7480000000000061FF7D00000000000000F8]} (kind:split, region:1696322398(41432, 3764), createAt:2024-09-08 07:00:28.989561267 +0800 CST m=+309921.719750998, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, size:71, steps:[split region with policy USEKEY],timeout:[1m0s])\""] [additional-info="{\"region-end-key\":\"7480000000000061FF7E00000000000000F8\",\"region-start-key\":\"7480000000000061FF7C5F728000000000FF007DDB0000000000FA\"}"]

[2024/09/08 07:00:29.004 +08:00] [INFO] [operator_controller.go:571] ["operator finish"] [region-id=1696322398] [takes=14.345354ms] [operator="\"rule-split-region {split: region 1696322398 use policy USEKEY and keys [7480000000000061FF7D00000000000000F8]} (kind:split, region:1696322398(41432, 3764), createAt:2024-09-08 07:00:28.989561267 +0800 CST m=+309921.719750998, startAt:2024-09-08 07:00:28.989658683 +0800 CST m=+309921.719848420, currentStep:1, size:71, steps:[split region with policy USEKEY],timeout:[1m0s]) finished\""] [additional-info="{\"region-end-key\":\"7480000000000061FF7E00000000000000F8\",\"region-st
art-key\":\"7480000000000061FF7C5F728000000000FF007DDB0000000000FA\"}"]

你这一次性建了多少个表的TiFlash副本,表的数据量很大?

TiFlash 日志报了”Modifications in meta haven’t persisted” 错,看起来像是TiFlash在同步数据时元数据修改未进行持久化,可能遇到bug了。

大量数据是多少,我们用的7.5挺稳的,难道这是bug么,期待大佬来解惑

原因:
更新tiflash 缓存的DDL被阻塞。

解决方案:
找到owner节点,重启节点即可

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。