今天又出现这个错误了,重试多次也都是失败了,每次失败报的文件还不一样。下面是 TiFlash Compute 节点的报错信息
[2024/12/02 16:06:34.704 +08:00] [ERROR] [S3RandomAccessFile.cpp:98] ["Cannot read from istream, size=1048592 gcount=589262 state=0x06 cur_offset=0 content_length=1504148 errmsg=Success cost=5215266ns"] [source=s18195849145/data/t_82592/dmf_420808/7.dat] [thread_id=16]
[2024/12/02 16:06:34.738 +08:00] [WARN] [Task.cpp:140] ["error occurred and cancel the query"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:11> 1"] [thread_id=16]
[2024/12/02 16:06:34.738 +08:00] [WARN] [PipelineExecutorContext.cpp:79] ["error cannot load checksum framed data from tiflash-remote-data/s18195849145/data/t_82592/dmf_420808/7.dat (errno = 0): (while reading from DTFile: s3://s18195849145/data/t_82592/dmf_420808) occured and cancel the query"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:11>"] [thread_id=16]
[2024/12/02 16:06:35.183 +08:00] [ERROR] [MPPTask.cpp:647] ["task running meets error: Code: 0, e.displayText() = DB::Exception: cannot load checksum framed data from tiflash-remote-data/s18195849145/data/t_82592/dmf_420808/7.dat (errno = 0): (while reading from DTFile: s3://s18195849145/data/t_82592/dmf_420808), e.what() = DB::Exception, Stack trace:\n\n\n 0x1ee9431\tDB::TiFlashException::TiFlashException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::TiFlashError const&) [tiflash+32412721]\n \tdbms/src/Common/TiFlashException.h:263\n 0x1da7e37\tDB::FramedChecksumReadBuffer<DB::Digest::XXH3>::expectRead(char*, unsigned long) [tiflash+31096375]\n \tdbms/src/IO/ChecksumBuffer.h:353\n 0x1da801d\tDB::FramedChecksumReadBuffer<DB::Digest::XXH3>::nextImpl() [tiflash+31096861]\n \tdbms/src/IO/ChecksumBuffer.h:391\n 0x1daae38\tDB::CompressedReadBufferBase<false>::readCompressedData(unsigned long&, unsigned long&) [tiflash+31108664]\n \tdbms/src/IO/CompressedReadBufferBase.cpp:53\n 0x1dd9363\tDB::CompressedReadBufferFromFileProvider<false>::nextImpl() [tiflash+31298403]\n \tdbms/src/Encryption/CompressedReadBufferFromFileProvider.cpp:32\n 0x785f603\tvoid DB::deserializeBinarySSE2<2>(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>&, DB::PODArray<unsigned long, 4096ul, Allocator<false>, 15ul, 16ul>&, DB::ReadBuffer&, unsigned long) [tiflash+126219779]\n \tdbms/src/DataTypes/DataTypeString.cpp:128\n 0x76aac30\tDB::DM::DMFileReader::readColumn(DB::DM::ColumnDefine const&, COWPtr<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, unsigned long, unsigned long, unsigned long) [tiflash+124431408]\n \tdbms/src/Storages/DeltaMerge/File/DMFileReader.cpp:838\n 0x76a83d1\tDB::DM::DMFileReader::read() [tiflash+124421073]\n \tdbms/src/Storages/DeltaMerge/File/DMFileReader.cpp:742\n 0x769bb25\tDB::DM::DMFileBlockInputStream::read() [tiflash+124369701]\n \tdbms/src/Storages/DeltaMerge/File/DMFileBlockInputStream.h:62\n 0x760f6bd\tDB::DM::ConcatSkippableBlockInputStream<false>::read() [tiflash+123795133]\n \tdbms/src/Storages/DeltaMerge/SkippableBlockInputStream.h:185\n 0x7635921\tDB::DM::readBlock(std::__1::shared_ptr<DB::DM::SkippableBlockInputStream>&, std::__1::shared_ptr<DB::DM::SkippableBlockInputStream>&) [tiflash+123951393]\n \tdbms/src/Storages/DeltaMerge/ReadUtil.cpp:33\n 0x7657fd8\tDB::DM::BitmapFilterBlockInputStream::readImpl(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+124092376]\n \tdbms/src/Storages/DeltaMerge/BitmapFilter/BitmapFilterBlockInputStream.cpp:40\n 0x7658554\tDB::DM::BitmapFilterBlockInputStream::readImpl() [tiflash+124093780]\n \tdbms/src/Storages/DeltaMerge/BitmapFilter/BitmapFilterBlockInputStream.h:46\n 0x77a9c15\tDB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+125475861]\n \tdbms/src/DataStreams/IProfilingBlockInputStream.cpp:82\n 0x77dd763\tDB::DM::Remote::RNSegmentSourceOp::executeIOImpl() [tiflash+125687651]\n \tdbms/src/Storages/DeltaMerge/Remote/RNSegmentSourceOp.cpp:132\n 0x891fe04\tDB::Operator::executeIO() [tiflash+143785476]\n \tdbms/src/Operators/Operator.cpp:81\n 0x8852b7a\tDB::PipelineTaskBase::runExecuteIO() [tiflash+142945146]\n \tdbms/src/Flash/Pipeline/Schedule/Tasks/PipelineTaskBase.h:88\n 0x89412ca\tDB::Task::executeIO() [tiflash+143921866]\n \tdbms/src/Flash/Pipeline/Schedule/Tasks/Task.cpp:140\n 0x1e9cf05\tDB::TaskThreadPool<DB::IOImpl>::loop(unsigned long) [tiflash+32100101]\n \tdbms/src/Flash/Pipeline/Schedule/ThreadPool/TaskThreadPool.cpp:59\n 0x1e9d636\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (DB::TaskThreadPool<DB::IOImpl>::*)(unsigned long), DB::TaskThreadPool<DB::IOImpl>*, unsigned long> >(void*) [tiflash+32101942]\n \t/usr/local/bin/../include/c++/v1/thread:291\n 0x7f78506c8ac3\t<unknown symbol> [libc.so.6+608963]\n 0x7f785075a850\t<unknown symbol> [libc.so.6+1206352]"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:11>"] [thread_id=78]
[2024/12/02 16:06:35.183 +08:00] [WARN] [MPPTask.cpp:745] ["Begin abort task: MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:11>, abort type: ONERROR"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:11>"] [thread_id=78]
[2024/12/02 16:06:35.183 +08:00] [WARN] [ExchangeReceiver.cpp:982] ["connection end. meet error: true, err msg: Code: 0, e.displayText() = DB::Exception: cannot load checksum framed data from tiflash-remote-data/s18195849145/data/t_82592/dmf_420808/7.dat (errno = 0): (while reading from DTFile: s3://s18195849145/data/t_82592/dmf_420808), e.what() = DB::Exception,, current alive connections: 3"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340 local tunnel11+15"] [thread_id=78]
[2024/12/02 16:06:35.183 +08:00] [WARN] [ExchangeReceiver.cpp:1003] ["Finish receiver channels, meet error: true, error message: Code: 0, e.displayText() = DB::Exception: cannot load checksum framed data from tiflash-remote-data/s18195849145/data/t_82592/dmf_420808/7.dat (errno = 0): (while reading from DTFile: s3://s18195849145/data/t_82592/dmf_420808), e.what() = DB::Exception,"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340"] [thread_id=78]
[2024/12/02 16:06:35.183 +08:00] [WARN] [MPPTask.cpp:774] ["Finish abort task from running"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:11>"] [thread_id=78]
[2024/12/02 16:06:35.186 +08:00] [WARN] [MPPTaskManager.cpp:277] ["Begin to abort gather: <gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>, abort type: ONCANCELLATION, reason: Receive cancel request from TiDB"] [thread_id=837]
[2024/12/02 16:06:35.186 +08:00] [WARN] [MPPTaskManager.cpp:321] ["Remaining task in gather <gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default> are: MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> "] [thread_id=837]
[2024/12/02 16:06:35.186 +08:00] [WARN] [MPPTask.cpp:745] ["Begin abort task: MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15>, abort type: ONCANCELLATION"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15>"] [thread_id=837]
[2024/12/02 16:06:35.186 +08:00] [WARN] [ExchangeReceiver.cpp:982] ["connection end. meet error: true, err msg: Exchange receiver meet error : Receive cancel request from TiDB, current alive connections: 2"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340 async tunnel12+15"] [thread_id=43]
[2024/12/02 16:06:35.187 +08:00] [WARN] [ExchangeReceiver.cpp:1003] ["Finish receiver channels, meet error: true, error message: Code: 0, e.displayText() = DB::Exception: cannot load checksum framed data from tiflash-remote-data/s18195849145/data/t_82592/dmf_420808/7.dat (errno = 0): (while reading from DTFile: s3://s18195849145/data/t_82592/dmf_420808), e.what() = DB::Exception,"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340"] [thread_id=43]
[2024/12/02 16:06:35.187 +08:00] [WARN] [ExchangeReceiver.cpp:982] ["connection end. meet error: true, err msg: Exchange receiver meet error : Receive cancel request from TiDB, current alive connections: 0"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340 async tunnel10+15"] [thread_id=223]
[2024/12/02 16:06:35.187 +08:00] [WARN] [ExchangeReceiver.cpp:1003] ["Finish receiver channels, meet error: true, error message: Code: 0, e.displayText() = DB::Exception: cannot load checksum framed data from tiflash-remote-data/s18195849145/data/t_82592/dmf_420808/7.dat (errno = 0): (while reading from DTFile: s3://s18195849145/data/t_82592/dmf_420808), e.what() = DB::Exception,"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340"] [thread_id=223]
[2024/12/02 16:06:35.187 +08:00] [WARN] [ExchangeReceiver.cpp:982] ["connection end. meet error: true, err msg: Exchange receiver meet error : Receive cancel request from TiDB, current alive connections: 1"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340 async tunnel13+15"] [thread_id=29]
[2024/12/02 16:06:35.187 +08:00] [WARN] [ExchangeReceiver.cpp:1003] ["Finish receiver channels, meet error: true, error message: Code: 0, e.displayText() = DB::Exception: cannot load checksum framed data from tiflash-remote-data/s18195849145/data/t_82592/dmf_420808/7.dat (errno = 0): (while reading from DTFile: s3://s18195849145/data/t_82592/dmf_420808), e.what() = DB::Exception,"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15> ExchangeReceiver_340"] [thread_id=29]
[2024/12/02 16:06:35.190 +08:00] [WARN] [MPPTask.cpp:774] ["Finish abort task from running"] [source="MPP<gather_id:<gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>,task_id:15>"] [thread_id=837]
[2024/12/02 16:06:35.190 +08:00] [WARN] [MPPTaskManager.cpp:339] ["Finish abort gather: <gather_id:1, query_ts:1733126669131441552, local_query_id:3191, server_id:1447, start_ts:454328757548220421, resource_group: default>"] [thread_id=837]