tiflash构建副本的过程中disconnect,然后状态转为down,手动无法启动

[2023/07/22 20:35:23.029 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2023/07/22 20:38:24.833 +08:00] [WARN] [] [“region {529048463,132823,2058} find error: region 529048463 is missing”] [source=pingcap.tikv] [thread_id=11]
[2023/07/22 20:41:30.792 +08:00] [WARN] [] [“region {529048463,132829,2058} find error: peer is not leader for region 529048463, leader may Some(id: 579882685 store_id: 106658737)”] [source=pingcap.tikv] [thread_id=15]
[2023/07/22 20:44:25.770 +08:00] [WARN] [] [“region {529048463,132829,2058} find error: region 529048463 is missing”] [source=pingcap.tikv] [thread_id=32]
[2023/07/22 20:47:47.455 +08:00] [WARN] [] [“region {529048463,132835,2058} find error: region 529048463 is missing”] [source=pingcap.tikv] [thread_id=18]
[2023/07/22 20:51:28.816 +08:00] [WARN] [] [“region {529048463,132841,2058} find error: peer is not leader for region 529048463, leader may Some(id: 579912912 store_id: 106658739)”] [source=pingcap.tikv] [thread_id=16]
[2023/07/22 20:51:35.070 +08:00] [ERROR] [WriteBatches.h:69] [“!!!=========================Modifications in meta haven’t persisted=========================!!! Stack trace: \n 0x68f4176\tDB::DM::WriteBatches::~WriteBatches()::‘lambda’(DB::WriteBatch const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&)::operator()(DB::WriteBatch const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) const [tiflash+110051702]\n \tdbms/src/Storages/DeltaMerge/WriteBatches.h:65\n 0x68e4f42\tDB::DM::WriteBatches::~WriteBatches() [tiflash+109989698]\n \tdbms/src/Storages/DeltaMerge/WriteBatches.h:74\n 0x692815e\tDB::DM::DeltaMergeStore::segmentMerge(DB::DM::DMContext&, std::__1::vector<std::__1::shared_ptrDB::DM::Segment, std::__1::allocator<std::__1::shared_ptrDB::DM::Segment > > const&, DB::DM::DeltaMergeStore::SegmentMergeReason) [tiflash+110264670]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalSegment.cpp:343\n 0x691c87e\tDB::DM::DeltaMergeStore::onSyncGc(long, DB::DM::GCOptions const&) [tiflash+110217342]\n \tdbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalBg.cpp:752\n 0x6a9a8b7\tDB::GCManager::work() [tiflash+111782071]\n \tdbms/src/Storages/GCManager.cpp:80\n 0x683746c\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_deletestd::__1::__thread_struct >, DB::BackgroundProcessingPool::BackgroundProcessingPool(int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >)::$_1> >(void*) [tiflash+109278316]\n \t/usr/local/bin/…/include/c++/v1/thread:291\n 0x7f1ea0d25f2b\t [libpthread.so.0+36651]\n 0x7f1ea09f16bf\t__clone [libc.so.6+1017535]”] [source=WriteBatches] [thread_id=33]
[2023/07/22 20:54:45.340 +08:00] [WARN] [] [“region {529048463,132841,2058} find error: peer is not leader for region 529048463, leader may Some(id: 579913620 store_id: 559680727)”] [source=pingcap.tikv] [thread_id=16]
[2023/07/22 21:03:21.571 +08:00] [WARN] [] [“region {529048463,132841,2058} find error: region 529048463 is missing”] [source=pingcap.tikv] [thread_id=36]
[2023/07/22 21:14:41.161 +08:00] [WARN] [SchemaGetter.cpp:208] [“The schema diff for version 46528, key Diff:46528 is empty.”] [source=SchemaGetter] [thread_id=14]
[2023/07/22 21:25:40.763 +08:00] [WARN] [SchemaGetter.cpp:208] [“The schema diff for version 46541, key Diff:46541 is empty.”] [source=SchemaGetter] [thread_id=5]
[2023/07/22 21:30:29.393 +08:00] [WARN] [] [“region {529048463,132847,2058} find error: peer is not leader for region 529048463, leader may Some(id: 579880119 store_id: 559680726)”] [source=pingcap.tikv] [thread_id=15]
[2023/07/22 21:39:00.084 +08:00] [WARN] [] [“region {529048463,132847,2058} find error: peer is not leader for region 529048463, leader may Some(id: 579912912 store_id: 106658739)”] [source=pingcap.tikv] [thread_id=36]
[2023/07/22 21:44:18.153 +08:00] [WARN] [] [“region {529048463,132847,2058} find error: region 529048463 is missing”] [source=pingcap.tikv] [thread_id=14]
[2023/07/22 21:45:43.995 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2023/07/22 21:46:02.997 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2023/07/22 21:46:05.588 +08:00] [WARN] [DMFile.cpp:732] [“Existing temporary or dropped dmfile, removed: /u01/tidb/tidb-data/tiflash-19000/data/t_4614/stable/.tmp.dmf_6664”] [source=DMFile] [thread_id=15]
[2023/07/22 21:46:21.492 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2023/07/22 21:46:40.738 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2023/07/22 21:46:59.240 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2023/07/22 21:47:17.506 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2023/07/22 21:47:35.760 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

根据你给的日志,关键报错信息是2023/07/22 20:51:35.070 +08:00] [ERROR] [WriteBatches.h:69] [“!!!=========================Modifications in meta haven’t persisted

看起来是原数据信息修改后,持久化出现了异常情况。

请问在构建tiflash副本的过程中有没有什么操作?tiki 和tiflash的监控面板上有没有什么异常现象?

紧执行了构建副本操作,alter table xxx set tiflash replica 2; 监控不出来异常

按理说设置添加副本操作不会出现这种问题的,不排除是版本有bug,你上面贴的是tiflash日志吧,再看看当时的tikv 有没有error日志,greg贴出来

error: region 529048463 is missing”] [source=pingcap.tikv] [thread_id=11]

原来的tikv数据有问题吗

tikv 看起来没啥异常

tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:18.118 +08:00] [ERROR] [peer.rs:614] [“handle raft message err”] [err_code=KV:Raft:StepPeerNotFound] [err=“Raft raft: cannot step as peer not found”] [peer_id=579887572] [region_id=579887571]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:18.118 +08:00] [ERROR] [peer.rs:614] [“handle raft message err”] [err_code=KV:Raft:StepPeerNotFound] [err=“Raft raft: cannot step as peer not found”] [peer_id=579887572] [region_id=579887571]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:18.129 +08:00] [ERROR] [peer.rs:614] [“handle raft message err”] [err_code=KV:Raft:StepPeerNotFound] [err=“Raft raft: cannot step as peer not found”] [peer_id=579887572] [region_id=579887571]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:18.129 +08:00] [ERROR] [peer.rs:614] [“handle raft message err”] [err_code=KV:Raft:StepPeerNotFound] [err=“Raft raft: cannot step as peer not found”] [peer_id=579887572] [region_id=579887571]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:18.129 +08:00] [ERROR] [peer.rs:614] [“handle raft message err”] [err_code=KV:Raft:StepPeerNotFound] [err=“Raft raft: cannot step as peer not found”] [peer_id=579887572] [region_id=579887571]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:28.233 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:28.238 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:28.238 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:29.247 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:31.814 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:35.387 +08:00] [ERROR] [peer.rs:614] [“handle raft message err”] [err_code=KV:Raft:StepPeerNotFound] [err=“Raft raft: cannot step as peer not found”] [peer_id=579888412] [region_id=579888411]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:38.489 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:41.374 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:41.488 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:41.488 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:42.720 +08:00] [ERROR] [pd.rs:2323] [“send request failed”] [err=“"Disconnected(…)"”] [cmd_type=TransferLeader] [region_id=579888183]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:48.751 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:49.022 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:49.022 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:50.081 +08:00] [ERROR] [apply.rs:1906] [“ingest fail”] [err_code=KV:Raftstore:EpochNotMatch] [err=“EpochNotMatch("conf_ver: 59462 version: 60128 != conf_ver: 59463 version: 60128", [id: 579887495 start_key: 7480000000000080FF6F5F728000000009FF0E38BF0000000000FA end_key: 7480000000000080FF6F5F728000000009FF11D8D80000000000FA region_epoch { conf_ver: 59463 version: 60128 } peers { id: 579887496 store_id: 559680729 } peers { id: 579902524 store_id: 106658737 } peers { id: 579902523 store_id: 559680728 }])”] [region=“id: 579887495 start_key: 7480000000000080FF6F5F728000000009FF0E38BF0000000000FA end_key: 7480000000000080FF6F5F728000000009FF11D8D80000000000FA region_epoch { conf_ver: 59463 version: 60128 } peers { id: 579887496 store_id: 559680729 } peers { id: 579902524 store_id: 106658737 } peers { id: 579902523 store_id: 559680728 }”] [sst=“uuid: B63FD9FD94184568BD3195CAD839545C range { start: 7480000000000080FF6F5F728000000009FF0E38BF0000000000FA end: 7480000000000080FF6F5F728000000009FF11D8D70000000000FA } cf_name: "default" region_id: 579887495 region_epoch { conf_ver: 59462 version: 60128 }”] [peer_id=579887496] [region_id=579887495]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:52.495 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:56.912 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:58.213 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:58.931 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:26:58.944 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:01.214 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:02.038 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:08.152 +08:00] [ERROR] [apply.rs:1906] [“ingest fail”] [err_code=KV:Raftstore:EpochNotMatch] [err=“EpochNotMatch("conf_ver: 59471 version: 60129 != conf_ver: 59472 version: 60129", [id: 579888423 start_key: 7480000000000080FF6F5F72800000000CFF26AB6A0000000000FA end_key: 7480000000000080FF6F5F72800000000CFF29F6600000000000FA region_epoch { conf_ver: 59472 version: 60129 } peers { id: 579891757 store_id: 106658738 } peers { id: 579891759 store_id: 106658739 } peers { id: 579905997 store_id: 106658737 } peers { id: 579905999 store_id: 559680729 }])”] [region=“id: 579888423 start_key: 7480000000000080FF6F5F72800000000CFF26AB6A0000000000FA end_key: 7480000000000080FF6F5F72800000000CFF29F6600000000000FA region_epoch { conf_ver: 59472 version: 60129 } peers { id: 579891757 store_id: 106658738 } peers { id: 579891759 store_id: 106658739 } peers { id: 579905997 store_id: 106658737 } peers { id: 579905999 store_id: 559680729 }”] [sst=“uuid: E970A79FE43F494E9207F4FF961CB51B range { start: 7480000000000080FF6F5F72800000000CFF26AB6A0000000000FA end: 7480000000000080FF6F5F72800000000CFF29F65F0000000000FA } cf_name: "default" region_id: 579888423 region_epoch { conf_ver: 59471 version: 60129 }”] [peer_id=579905999] [region_id=579888423]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:14.386 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:16.001 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:18.376 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:20.082 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:21.525 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:22.480 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:28.071 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:28.073 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:30.715 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:34.939 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:38.039 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:38.040 +08:00] [ERROR] [entry_storage.rs:1122] [“entries are fetched unexpectedly during warming up”]
tikv-2023-07-22T21-25-44.873.log:[2023/07/22 20:27:46.435 +08:00] [ERROR] [apply.rs:1906] [“ingest fail”] [err_code=KV:Raftstore:EpochNotMatch] [err=“EpochNotMatch("conf_ver: 59471 version: 60129 != conf_ver: 59477 version: 60129", [id: 579889739 start_key: 7480000000000080FF6F5F728000000010FF87B8FB0000000000FA end_key: 7480000000000080FF6F5F728000000010FF8B1A1E0000000000FA region_epoch { conf_ver: 59477 version: 60129 } peers { id: 579892321 store_id: 559680728 } peers { id: 579910910 store_id: 559680729 } peers { id: 579912627 store_id: 106658737 }])”] [region=“id: 579889739 start_key: 7480000000000080FF6F5F728000000010FF87B8FB0000000000FA end_key: 7480000000000080FF6F5F728000000010FF8B1A1E0000000000FA region_epoch { conf_ver: 59477 version: 60129 } peers { id: 579892321 store_id: 559680728 } peers { id: 579910910 store_id: 559680729 } peers { id: 579912627 store_id: 106658737 }”] [sst=“uuid: 928930B7368F4B51AC4F2166EA33090F range { start: 7480000000000080FF6F5F728000000010FF87B8FB0000000000FA end: 7480000000000080FF6F5F728000000010FF8B1A1D0000000000FA } cf_name: "default" region_id: 579889739 region_epoch { conf_ver: 59471 version: 60129 }”] [peer_id=579910910] [region_id=579889739]

TIKV_REGION_STATUS 和 TIKV_REGION_PEERS视图看看报错的的regions有吗

可以查看一下集群的grafana监控图,以及dashboard集群访问流量情况,那些图表会更直观一些

https://github.com/pingcap/tiflash/issues/5285

应该就是这个issue。我翻到最后没有找到明确的解决办法。里面提到提到一些可能的方向,你对比调整一下看看吧。