升级6.1后,TiFlash服务异常

【 TiDB 使用环境`】生产环境
【 TiDB 版本】5.7.25-TiDB-v6.1.0
【遇到的问题】昨天升级6.1.0之后今天凌晨TiFlash服务出现故障,TiFlash服务内存耗尽导致服务重启。今天上午集群重启之后运行了一段时间TiFlash上的查询变得非常慢,怀疑是分区表分区裁剪造成的,把 tidb_partition_prune_mode 从dynamic改成了static之后目前运行正常,还在观察。

补充信息:目前TiFlash节点和TiKV是混部的。

目前我们没法确定出现问题的根本原因,请帮忙分析一下。

【附件】 相关日志及监控(https://metricstool.pingcap.com/)
凌晨服务宕机时的部分日志:

[2022/06/23 03:05:35.467 +08:00] [ERROR] [DAGDriver.cpp:197] ["DAGDriver:DB Exception: Allocator: Cannot malloc 1.00 MiB., errno: 12, strerror: Cannot allocate memory

   0x1d272d3	StackTrace::StackTrace() [tiflash+30569171]
            	dbms/src/Common/StackTrace.cpp:23
   0x1d248d6	DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
            	dbms/src/Common/Exception.h:41
   0x1defa2e	DB::SharedQueryBlockInputStream::waitThread() [tiflash+31390254]
            	dbms/src/DataStreams/SharedQueryBlockInputStream.h:169
   0x1deed0f	DB::SharedQueryBlockInputStream::readImpl() [tiflash+31386895]
            	dbms/src/DataStreams/SharedQueryBlockInputStream.h:124
   0x6688b8a	DB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+107514762]
            	dbms/src/DataStreams/IProfilingBlockInputStream.cpp:75
   0x66888c4	DB::IProfilingBlockInputStream::read() [tiflash+107514052]
            	dbms/src/DataStreams/IProfilingBlockInputStream.cpp:43
   0x738e696	DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, false>::Handler, (DB::StreamUnionMode)0>::loop(unsigned long) [tiflash+121169558]
            	dbms/src/DataStreams/ParallelInputsProcessor.h:298
   0x738e1f7	DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, false>::Handler, (DB::StreamUnionMode)0>::thread(unsigned long) [tiflash+121168375]
            	dbms/src/DataStreams/ParallelInputsProcessor.h:236
   0x1db172c	auto DB::wrapInvocable<std::__1::function<void ()> >(bool, std::__1::function<void ()>&&)::'lambda'()::operator()() [tiflash+31135532]
            	dbms/src/Common/wrapInvocable.h:36
   0x1db1895	std::__1::packaged_task<void ()>::operator()() [tiflash+31135893]
            	/usr/local/bin/../include/c++/v1/future:2089
   0x1da0e68	DB::DynamicThreadPool::executeTask(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&) [tiflash+31067752]
            	dbms/src/Common/DynamicThreadPool.cpp:101
   0x1da0d46	DB::DynamicThreadPool::dynamicWork(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >) [tiflash+31067462]
            	dbms/src/Common/DynamicThreadPool.cpp:142
   0x1da29bd	auto std::__1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >), DB::DynamicThreadPool*, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> > >(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void (DB::DynamicThreadPool::*&&)(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >), DB::DynamicThreadPool*&&, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&&)::'lambda'(auto&&...)::operator()<DB::DynamicThreadPool*, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> > >(auto&&...) const [tiflash+31074749]
            	dbms/src/Common/ThreadFactory.h:47
   0x1da2799	void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >), DB::DynamicThreadPool*, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> > >(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void (DB::DynamicThreadPool::*&&)(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >), DB::DynamicThreadPool*&&, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&&)::'lambda'(auto&&...), DB::DynamicThreadPool*, std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> > > >(void*) [tiflash+31074201]
            	/usr/local/bin/../include/c++/v1/thread:291
  0x7f62c8ca4ea5	start_thread [libpthread.so.0+32421]
  0x7f62c87b79fd	__clone [libc.so.6+1042941]"] [thread_id=882]
[2022/06/23 03:05:35.474 +08:00] [ERROR] [Server.cpp:310] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/cpp/thread_manager/thread_manager.cc, line number: 38, log msg : Could not create grpc_sync_server worker-thread"] [thread_id=59020]`

上午查询缓慢时的部分异常监控截图:



若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

1 tiup cluster display 信息能发一下吗?
2 能用clinic上传一下 监控和日志信息吗?
https://docs.pingcap.com/zh/tidb/stable/quick-start-with-clinic#pingcap-clinic-快速上手指南

Cluster信息如下:

tiup is checking updates for component cluster ...
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.10.2/tiup-cluster display SQ-cluster
Cluster type:       tidb
Cluster name:       SQ-cluster
Cluster version:    v6.1.0
Deploy user:        tidb
SSH type:           builtin
Dashboard URL:      http://192.168.14.4:2379/dashboard
Grafana URL:        http://192.168.14.2:3000
ID                   Role          Host           Ports                            OS/Arch       Status  Data Dir                                   Deploy Dir
--                   ----          ----           -----                            -------       ------  --------                                   ----------
192.168.14.2:9093    alertmanager  192.168.14.2   9093/9094                        linux/x86_64  Up      /data/deploy/alertmanager/data             /data/deploy/alertmanager
192.168.14.2:3000    grafana       192.168.14.2   3000                             linux/x86_64  Up      -                                          /data/deploy
192.168.14.21:2379   pd            192.168.14.21  2379/2380                        linux/x86_64  Up      /data/deploy/pd/data                       /data/deploy/pd
192.168.14.22:2379   pd            192.168.14.22  2379/2380                        linux/x86_64  Up      /data/deploy/pd/data                       /data/deploy/pd
192.168.14.3:2379    pd            192.168.14.3   2379/2380                        linux/x86_64  Up|L    /data/deploy/data.pd                       /data/deploy
192.168.14.4:2379    pd            192.168.14.4   2379/2380                        linux/x86_64  Up|UI   /data/deploy/data.pd                       /data/deploy
192.168.14.5:2379    pd            192.168.14.5   2379/2380                        linux/x86_64  Up      /data/deploy/data.pd                       /data/deploy
192.168.14.2:9090    prometheus    192.168.14.2   9090/12020                       linux/x86_64  Up      /data/deploy/prometheus2.0.0.data.metrics  /data/deploy
192.168.14.21:4000   tidb          192.168.14.21  4000/10080                       linux/x86_64  Up      -                                          /data/deploy/tidb
192.168.14.22:4000   tidb          192.168.14.22  4000/10080                       linux/x86_64  Up      -                                          /data/deploy/tidb
192.168.14.3:4000    tidb          192.168.14.3   4000/10080                       linux/x86_64  Up      -                                          /data/deploy
192.168.14.4:4000    tidb          192.168.14.4   4000/10080                       linux/x86_64  Up      -                                          /data/deploy
192.168.14.5:4000    tidb          192.168.14.5   4000/10080                       linux/x86_64  Up      -                                          /data/deploy
192.168.14.23:9000   tiflash       192.168.14.23  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      /data01/deploy/data                        /data01/deploy
192.168.14.24:9000   tiflash       192.168.14.24  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      /data01/deploy/data                        /data01/deploy
192.168.14.25:9000   tiflash       192.168.14.25  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      /data01/deploy/data                        /data01/deploy
192.168.14.26:9000   tiflash       192.168.14.26  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      /data01/deploy/data                        /data01/deploy
192.168.14.27:9000   tiflash       192.168.14.27  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      /data01/deploy/data                        /data01/deploy
192.168.14.28:9000   tiflash       192.168.14.28  9000/8123/3930/20170/20292/8234  linux/x86_64  Up      /data01/deploy/data                        /data01/deploy
192.168.14.10:20171  tikv          192.168.14.10  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.11:20171  tikv          192.168.14.11  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.23:20171  tikv          192.168.14.23  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.24:20171  tikv          192.168.14.24  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.25:20171  tikv          192.168.14.25  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.26:20171  tikv          192.168.14.26  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.27:20171  tikv          192.168.14.27  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.28:20171  tikv          192.168.14.28  20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.6:20171   tikv          192.168.14.6   20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.7:20171   tikv          192.168.14.7   20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.8:20171   tikv          192.168.14.8   20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
192.168.14.9:20171   tikv          192.168.14.9   20171/20180                      linux/x86_64  Up      /data/deploy/data                          /data/deploy
Total nodes: 31

完整的日志不方便上传,有什么需要查的我可以配合。

服务异常期间其他的一下错误日志:

[2022/06/23 09:40:53.708 +08:00] [ERROR] [DAGDriver.cpp:197] ["DAGDriver:DB Exception: Failed to write resp

       0x1d272d3\tStackTrace::StackTrace() [tiflash+30569171]
                \tdbms/src/Common/StackTrace.cpp:23
       0x1d248d6\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
                \tdbms/src/Common/Exception.h:41
       0x7b9a3f4\tDB::StreamWriter::write(tipb::SelectResponse&, unsigned short) [tiflash+129606644]
                \tdbms/src/Flash/Coprocessor/StreamWriter.h:60
       0x7b97d67\tvoid DB::StreamingDAGResponseWriter<std::__1::shared_ptr<DB::StreamWriter> >::batchWrite<true>() [tiflash+129596775]
                \tdbms/src/Flash/Coprocessor/StreamingDAGResponseWriter.cpp:314
       0x7370176\tDB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::__1::atomic<bool>*) [tiflash+121045366]
                \tdbms/src/DataStreams/copyData.cpp:78
       0x7af74d1\tDB::DAGDriver<true>::execute() [tiflash+128939217]
                \tdbms/src/Flash/Coprocessor/DAGDriver.cpp:142
       0x7ad6522\tDB::BatchCoprocessorHandler::execute() [tiflash+128804130]
                \tdbms/src/Flash/BatchCoprocessorHandler.cpp:78
       0x7ad2d3f\tstd::__1::__function::__func<DB::FlashService::BatchCoprocessor(grpc_impl::ServerContext*, coprocessor::BatchRequest const*, grpc_impl::ServerWriter<coprocessor::BatchResponse>*)::$_23, std::__1::allocator<DB::FlashService::BatchCoprocessor(grpc_impl::ServerContext*, coprocessor::BatchRequest const*, grpc_impl::ServerWriter<coprocessor::BatchResponse>*)::$_23>, grpc::Status ()>::operator()() [tiflash+128789823]
                \t/usr/local/bin/../include/c++/v1/__functional/function.h:345
       0x7ad5968\tstd::__1::__packaged_task_func<std::__1::function<grpc::Status ()>, std::__1::allocator<std::__1::function<grpc::Status ()> >, grpc::Status ()>::operator()() [tiflash+128801128]
                \t/usr/local/bin/../include/c++/v1/future:1687
       0x7ad5a65\tstd::__1::packaged_task<grpc::Status ()>::operator()() [tiflash+128801381]
                \t/usr/local/bin/../include/c++/v1/future:1960
       0x7c75e59\tThreadPool::worker() [tiflash+130506329]
                \tlibs/libcommon/src/ThreadPool.cpp:129
       0x7c76183\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPool::ThreadPool(unsigned long, std::__1::function<void ()>)::$_1> >(void*) [tiflash+130507139]
                \t/usr/local/bin/../include/c++/v1/thread:291
  0x7f5295810ea5\tstart_thread [libpthread.so.0+32421]
  0x7f52953239fd\t__clone [libc.so.6+1042941]"] [thread_id=18032]


[2022/06/23 09:40:51.204 +08:00] [ERROR] [<unknown>] ["pingcap.tikv: Failed4: Deadline Exceeded"] [thread_id=27897]
[2022/06/23 10:50:38.551 +08:00] [ERROR] [Server.cpp:310] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/cpp/thread_manager/thread_manager.cc, line number: 38, log msg : Could not create grpc_sync_server worker-thread"] [thread_id=72014]

[2022/06/23 10:50:40.012 +08:00] [ERROR] [BaseDaemon.cpp:570] ["BaseDaemon:
       0x1ed2661\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+32319073]
                \tlibs/libdaemon/src/BaseDaemon.cpp:221
  0x7f5295818630\t<unknown symbol> [libpthread.so.0+63024]
  0x7f529525b387\tgsignal [libc.so.6+222087]
  0x7f529525ca78\t__GI_abort [libc.so.6+227960]
       0x88a843e\tgpr_malloc [tiflash+143295550]
                \tcontrib/grpc/src/core/lib/gpr/alloc.cc:34
       0x85b2f56\tgrpc_resource_user_alloc_slices(grpc_resource_user_slice_allocator*, unsigned long, unsigned long, grpc_slice_buffer*) [tiflash+140193622]
                \tcontrib/grpc/src/core/lib/iomgr/resource_quota.cc:1013
       0x85b5fba\ttcp_handle_read(void*, grpc_error*) [tiflash+140206010]
                \tcontrib/grpc/src/core/lib/iomgr/tcp_posix.cc:619
       0x86dc0f2\tread_action_locked(void*, grpc_error*) [tiflash+141410546]
                \tcontrib/grpc/src/core/ext/transport/chttp2/transport/chttp2_transport.cc:2590
       0x85a39dc\tgrpc_combiner_continue_exec_ctx() [tiflash+140130780]
                \tcontrib/grpc/src/core/lib/iomgr/combiner.cc:236
       0x85a623d\tgrpc_core::ExecCtx::Flush() [tiflash+140141117]
                \tcontrib/grpc/src/core/lib/iomgr/exec_ctx.cc:156
       0x85ac06b\tpollset_work(grpc_pollset*, grpc_pollset_worker**, long) [tiflash+140165227]
                \tcontrib/grpc/src/core/lib/iomgr/ev_epollex_linux.cc:1136
       0x85cca7e\tcq_pluck(grpc_completion_queue*, void*, gpr_timespec, void*) [tiflash+140298878]
                \tcontrib/grpc/src/core/lib/surface/completion_queue.cc:1281
       0x856ab11\tgrpc_impl::internal::ErrorMethodHandler<(grpc::StatusCode)8>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&) [tiflash+139897617]
                \tcontrib/grpc/include/grpcpp/impl/codegen/method_handler_impl.h:367
       0x8565260\tgrpc_impl::Server::SyncRequest::CallData::ContinueRunAfterInterception() [tiflash+139874912]
                \tcontrib/grpc/src/cpp/server/server_cc.cc:492
       0x8565080\tgrpc_impl::Server::SyncRequest::CallData::Run(std::__1::shared_ptr<grpc_impl::Server::GlobalCallbacks> const&, bool) [tiflash+139874432]
                \tcontrib/grpc/src/cpp/server/server_cc.cc:479
       0x8575bb7\tgrpc::ThreadManager::MainWorkLoop() [tiflash+139942839]
                \tcontrib/grpc/src/cpp/thread_manager/thread_manager.cc:211
       0x8576651\tgrpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) [tiflash+139945553]
                \tcontrib/grpc/src/cpp/thread_manager/thread_manager.cc:35
       0x88aca6c\tgrpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) [tiflash+143313516]
                \tcontrib/grpc/src/core/lib/gprpp/thd_posix.cc:110
  0x7f5295810ea5\tstart_thread [libpthread.so.0+32421]"] [thread_id=72017]
[2022/06/23 10:50:40.012 +08:00] [ERROR] [Server.cpp:310] ["grpc:/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/grpc/src/cpp/thread_manager/thread_manager.cc, line number: 38, log msg : Could not create grpc_sync_server worker-thread"] [thread_id=52616] 

其中 Could not create grpc_sync_server worker-threadDeadline Exceeded错误大量出现。

能把这6个监控,下载发一下吗?https://metricstool.pingcap.com/#backup-with-dev-tools
overview,
tidb,
tikv-detail,
TiFlash-Summary、
TiFlash-Proxy-Summary、
TiFlash-Proxy-Details

SQ-cluster-Overview_2022-06-24T06_42_17.765Z.json (83.4 KB) SQ-cluster-TiDB_2022-06-24T06_48_34.644Z.json (11.6 MB) SQ-cluster-TiFlash-Proxy-Details_2022-06-24T06_55_25.929Z.json (6.3 MB) SQ-cluster-TiFlash-Proxy-Summary_2022-06-24T06_54_25.351Z.json (1.1 MB) SQ-cluster-TiFlash-Summary_2022-06-24T06_52_52.718Z.json (9.0 MB) SQ-cluster-TiKV-Details_2022-06-24T06_51_56.521Z.json (47.0 MB)

以上是监控文件。

再补充一下时间信息:我们的TiFlash节点大概是3点半左右开始出现问题导致服务器重启,重启之后由于防火墙的问题导致节点都无法提供服务,我们在早上8点左右关闭防火墙,并重启整个集群,然后10点左右开始出现大量慢查询,所有走TiFlash的查询都无法处理。然后我们在12点左右把分区动态裁剪关掉,积压的查询kill掉之后TiFlash的查询恢复正常。

今天凌晨又有一个TiFlash的节点挂掉,然后一直在尝试启动失败,这是新出现的报错的log:

[2022/06/24 02:32:28.445 +08:00] [ERROR] [DAGDriver.cpp:197] ["DAGDriver:DB Exception: Cannot read from file /data01/deploy/data/data/t_32925/log/page_1_0/page., errno: 29, strerror: Illegal seek

       0x1d272d3 StackTrace::StackTrace() [tiflash+30569171]
                 dbms/src/Common/StackTrace.cpp:23
       0x1d248d6 DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
                 dbms/src/Common/Exception.h:41
       0x1d302ba DB::throwFromErrno(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) [tiflash+30606010]
                 dbms/src/Common/Exception.cpp:59
       0x786632b void DB::PageUtil::readFile<std::__1::shared_ptr<DB::RandomAccessFile> >(std::__1::shared_ptr<DB::RandomAccessFile>&, long, char const*, unsigned long, std::__1::shared_ptr<DB::ReadLimiter> const&, bool) [tiflash+126247723]
                 dbms/src/Storages/Page/PageUtil.h:270
       0x79f7e8c DB::PS::V2::PageFile::Reader::read(std::__1::vector<DB::PS::V2::PageFile::Reader::FieldReadInfo, std::__1::allocator<DB::PS::V2::PageFile::Reader::FieldReadInfo> >&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+127893132]
                 dbms/src/Storages/Page/V2/PageFile.cpp:1039
       0x7a0c089 DB::PS::V2::PageStorage::readImpl(unsigned long, std::__1::vector<std::__1::pair<unsigned long, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > >, std::__1::allocator<std::__1::pair<unsigned long, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > > > > const&, std::__1::shared_ptr<DB::ReadLimiter> const&, std::__1::shared_ptr<DB::PageStorageSnapshot>, bool) [tiflash+127975561]
                 dbms/src/Storages/Page/V2/PageStorage.cpp:783
       0x7a8fbf0 DB::PageReaderImplNormal::read(std::__1::vector<std::__1::pair<unsigned long, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > >, std::__1::allocator<std::__1::pair<unsigned long, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > > > > const&) const [tiflash+128515056]
                 dbms/src/Storages/Page/PageStorage.cpp:113
       0x7a8d3f2 DB::PageReader::read(std::__1::vector<std::__1::pair<unsigned long, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > >, std::__1::allocator<std::__1::pair<unsigned long, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > > > > const&) const [tiflash+128504818]
                 dbms/src/Storages/Page/PageStorage.cpp:415
       0x7898948 DB::DM::ColumnFileTiny::readFromDisk(DB::PageReader const&, std::__1::vector<DB::DM::ColumnDefine, std::__1::allocator<DB::DM::ColumnDefine> > const&, unsigned long, unsigned long) const [tiflash+126454088]
                 dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTiny.cpp:79
       0x7899124 DB::DM::ColumnFileTiny::fillColumns(DB::PageReader const&, std::__1::vector<DB::DM::ColumnDefine, std::__1::allocator<DB::DM::ColumnDefine> > const&, unsigned long, std::__1::vector<COWPtr<DB::IColumn>::immutable_ptr<DB::IColumn>, std::__1::allocator<COWPtr<DB::IColumn>::immutable_ptr<DB::IColumn> > >&) const [tiflash+126456100]
                 dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTiny.cpp:115
       0x789a3f6 DB::DM::ColumnFileTinyReader::readRows(std::__1::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, unsigned long, unsigned long, DB::DM::RowKeyRange const*) [tiflash+126460918]
                 dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTiny.cpp:237
       0x788fa13 DB::DM::ColumnFileSetReader::readRows(std::__1::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, unsigned long, unsigned long, DB::DM::RowKeyRange const*) [tiflash+126417427]
                 dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileSetReader.cpp:160
       0x78c54ab DB::DM::DeltaValueReader::readRows(std::__1::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, unsigned long, unsigned long, DB::DM::RowKeyRange const*) [tiflash+126637227]
                 dbms/src/Storages/DeltaMerge/Delta/Snapshot.cpp:117
       0x77ebc05 void DB::DM::DeltaMergeBlockInputStream<DB::DM::DeltaValueReader, DB::DM::DTCompactedEntries<55ul, 20ul, 3ul>::Iterator, false>::next<false, false>(std::__1::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::__1::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, unsigned long&) [tiflash+125746181]
                 dbms/src/Storages/DeltaMerge/DeltaMerge.h:0
       0x77ea997 DB::DM::DeltaMergeBlockInputStream<DB::DM::DeltaValueReader, DB::DM::DTCompactedEntries<55ul, 20ul, 3ul>::Iterator, false>::doRead() [tiflash+125741463]
                 dbms/src/Storages/DeltaMerge/DeltaMerge.h:242
       0x77ea55c DB::DM::DeltaMergeBlockInputStream<DB::DM::DeltaValueReader, DB::DM::DTCompactedEntries<55ul, 20ul, 3ul>::Iterator, false>::read() [tiflash+125740380]
                 dbms/src/Storages/DeltaMerge/DeltaMerge.h:174
       0x77ec9b9 DB::DM::DMRowKeyFilterBlockInputStream<true>::read() [tiflash+125749689]
                 dbms/src/Storages/DeltaMerge/RowKeyFilter.h:178
       0x77e17d5 DB::DM::readNextBlock(std::__1::shared_ptr<DB::IBlockInputStream> const&) [tiflash+125704149]
                 dbms/src/Storages/DeltaMerge/DeltaMergeHelpers.h:253
       0x7804405 DB::DM::DMVersionFilterBlockInputStream<0>::initNextBlock() [tiflash+125846533]
                 dbms/src/Storages/DeltaMerge/DMVersionFilterBlockInputStream.h:133
       0x7803505 DB::DM::DMVersionFilterBlockInputStream<0>::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+125842693]
                 dbms/src/Storages/DeltaMerge/DMVersionFilterBlockInputStream.cpp:53
       0x77b06d3 DB::DM::DMSegmentThreadInputStream::readImpl(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+125503187]
                 dbms/src/Storages/DeltaMerge/DMSegmentThreadInputStream.h:124
       0x77b05c8 DB::DM::DMSegmentThreadInputStream::readImpl() [tiflash+125502920]
                 dbms/src/Storages/DeltaMerge/DMSegmentThreadInputStream.h:85
       0x6688b8a DB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+107514762]
                 dbms/src/DataStreams/IProfilingBlockInputStream.cpp:75
       0x66888c4 DB::IProfilingBlockInputStream::read() [tiflash+107514052]
                 dbms/src/DataStreams/IProfilingBlockInputStream.cpp:43
       0x739e49b DB::ExpressionBlockInputStream::readImpl() [tiflash+121234587]
                 dbms/src/DataStreams/ExpressionBlockInputStream.cpp:50
       0x1dc8c32 DB::IProfilingBlockInputStream::readImpl(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+31231026]
                 dbms/src/DataStreams/IProfilingBlockInputStream.h:232
       0x6688a0e DB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+107514382]
                 dbms/src/DataStreams/IProfilingBlockInputStream.cpp:73
       0x73a0ef0 DB::FilterBlockInputStream::readImpl() [tiflash+121245424]
                 dbms/src/DataStreams/FilterBlockInputStream.cpp:91
       0x6688b8a DB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+107514762]
                 dbms/src/DataStreams/IProfilingBlockInputStream.cpp:75
       0x66888c4 DB::IProfilingBlockInputStream::read() [tiflash+107514052]
                 dbms/src/DataStreams/IProfilingBlockInputStream.cpp:43
       0x739e49b DB::ExpressionBlockInputStream::readImpl() [tiflash+121234587]
                 dbms/src/DataStreams/ExpressionBlockInputStream.cpp:50
       0x6688b8a DB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+107514762]
                 dbms/src/DataStreams/IProfilingBlockInputStream.cpp:75"] [thread_id=47105]

[2022/06/24 05:46:25.680 +08:00] [ERROR] [Exception.cpp:85] ["DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context &, bool, const DB::String &, const DB::String &, DB::TableID, const DB::DM::ColumnDefines &, const DB::DM::ColumnDefine &, bool, size_t, const DB::DM::DeltaMergeStore::Settings &):Code: 49, e.displayText() = DB::Exception: PageFile binary version not match {MergingReader of PageFile_7_0, type: Formal, sequence no: 0, meta offset: 25098, data offset: 9991564} [unknown_version=0] [file=/data01/deploy/data/data/t_15303/meta/page_7_0/meta], e.what() = DB::Exception, Stack trace:


       0x1d272d3 StackTrace::StackTrace() [tiflash+30569171]
                 dbms/src/Common/StackTrace.cpp:23
       0x1d248d6 DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
                 dbms/src/Common/Exception.h:41
       0x79f53a3 DB::PS::V2::PageFile::MetaMergingReader::moveNext(unsigned int*) [tiflash+127882147]
                 dbms/src/Storages/Page/V2/PageFile.cpp:585
       0x7a06501 DB::PS::V2::PageStorage::restore() [tiflash+127952129]
                 dbms/src/Storages/Page/V2/PageStorage.cpp:289
       0x781ec0f DB::DM::StoragePool::restore() [tiflash+125955087]
                 dbms/src/Storages/DeltaMerge/StoragePool.cpp:379
       0x7788ca1 DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, long, std::__1::vector<DB::DM::ColumnDefine, std::__1::allocator<DB::DM::ColumnDefine> > const&, DB::DM::ColumnDefine const&, bool, unsigned long, DB::DM::DeltaMergeStore::Settings const&) [tiflash+125340833]
                 dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:243
       0x771283d DB::StorageDeltaMerge::getAndMaybeInitStore() [tiflash+124856381]
                 dbms/src/Storages/StorageDeltaMerge.cpp:1545
       0x77187fe DB::StorageDeltaMerge::getSchemaSnapshotAndBlockForDecoding(DB::TableDoubleLockHolder<false> const&, bool) [tiflash+124880894]
                 dbms/src/Storages/StorageDeltaMerge.cpp:910
       0x7916b6d DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*)::$_0::operator()(bool) const [tiflash+126970733]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:123
       0x7912f32 DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*) [tiflash+126955314]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:169
       0x7912c9c DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool) [tiflash+126954652]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:348
       0x79351ff DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&) [tiflash+127095295]
                 dbms/src/Storages/Transaction/Region.cpp:712
       0x7901910 DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) [tiflash+126884112]
                 dbms/src/Storages/Transaction/KVStore.cpp:285
       0x791b822 HandleWriteRaftCmd [tiflash+126990370]
                 dbms/src/Storages/Transaction/ProxyFFI.cpp:92
  0x7f82f6f0a6e0 raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::h4c8bba241d5ee4d4 [libtiflash_proxy.so+28567264]
  0x7f82f6f07d6c raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h478c4bad834ddec7 [libtiflash_proxy.so+28556652]
  0x7f82f6f27f91 raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::he4c0b1137ae7cad9 [libtiflash_proxy.so+28688273]
  0x7f82f6f2abfe raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::h5b138e5889c69e66 [libtiflash_proxy.so+28699646]
  0x7f82f6f2d510 _$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::he20c24833a37424f [libtiflash_proxy.so+28710160]
  0x7f82f6ba0609 batch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::hec57674a22fdcc30 [libtiflash_proxy.so+24987145]
  0x7f82f7249cd8 std::sys_common::backtrace::__rust_begin_short_backtrace::h189c0b2868fa537c [libtiflash_proxy.so+31972568]
  0x7f82f6a81ec1 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h7c53021c905641b6 [libtiflash_proxy.so+23813825]
  0x7f82f745ffca std::sys::unix::thread::Thread::new::thread_start::hd39c5f08bdcda277 [libtiflash_proxy.so+34160586]
  0x7f82f4aa9ea5 start_thread [libpthread.so.0+32421]
  0x7f82f45bc9fd __clone [libc.so.6+1042941]"] [thread_id=92]
[2022/06/24 05:46:59.759 +08:00] [ERROR] [Exception.cpp:85] ["DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context &, bool, const DB::String &, const DB::String &, DB::TableID, const DB::DM::ColumnDefines &, const DB::DM::ColumnDefine &, bool, size_t, const DB::DM::DeltaMergeStore::Settings &):Code: 40, e.displayText() = DB::Exception: Page [9] checksum not match, broken file: /data01/deploy/data/data/t_17615/meta/page_26_0/page, expected: 83cb09c59276f157, but: c38df517068eea44, e.what() = DB::Exception, Stack trace:


       0x1d272d3 StackTrace::StackTrace() [tiflash+30569171]
                 dbms/src/Common/StackTrace.cpp:23
       0x1d248d6 DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
                 dbms/src/Common/Exception.h:41
       0x79f725a DB::PS::V2::PageFile::Reader::read(std::__1::vector<std::__1::pair<unsigned long, DB::PageEntry>, std::__1::allocator<std::__1::pair<unsigned long, DB::PageEntry> > >&, std::__1::shared_ptr<DB::ReadLimiter> const&, bool) [tiflash+127890010]
                 dbms/src/Storages/Page/V2/PageFile.cpp:916
       0x7a0a78b DB::PS::V2::PageStorage::readImpl(unsigned long, unsigned long, std::__1::shared_ptr<DB::ReadLimiter> const&, std::__1::shared_ptr<DB::PageStorageSnapshot>, bool) [tiflash+127969163]
                 dbms/src/Storages/Page/V2/PageStorage.cpp:644
       0x7a8f8fd DB::PageReaderImplNormal::read(unsigned long) const [tiflash+128514301]
                 dbms/src/Storages/Page/PageStorage.cpp:97
       0x7a8d392 DB::PageReader::read(unsigned long) const [tiflash+128504722]
                 dbms/src/Storages/Page/PageStorage.cpp:400
       0x78b05f9 DB::DM::ColumnFilePersistedSet::restore(DB::DM::DMContext&, DB::DM::RowKeyRange const&, unsigned long) [tiflash+126551545]
                 dbms/src/Storages/DeltaMerge/Delta/ColumnFilePersistedSet.cpp:110
       0x78aa43a DB::DM::DeltaValueSpace::restore(DB::DM::DMContext&, DB::DM::RowKeyRange const&, unsigned long) [tiflash+126526522]
                 dbms/src/Storages/DeltaMerge/Delta/DeltaValueSpace.cpp:59
       0x77c34c0 DB::DM::Segment::restoreSegment(DB::DM::DMContext&, unsigned long) [tiflash+125580480]
                 dbms/src/Storages/DeltaMerge/Segment.cpp:274
       0x7788d0f DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, long, std::__1::vector<DB::DM::ColumnDefine, std::__1::allocator<DB::DM::ColumnDefine> > const&, DB::DM::ColumnDefine const&, bool, unsigned long, DB::DM::DeltaMergeStore::Settings const&) [tiflash+125340943]
                 dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:270
       0x771283d DB::StorageDeltaMerge::getAndMaybeInitStore() [tiflash+124856381]
                 dbms/src/Storages/StorageDeltaMerge.cpp:1545
       0x77187fe DB::StorageDeltaMerge::getSchemaSnapshotAndBlockForDecoding(DB::TableDoubleLockHolder<false> const&, bool) [tiflash+124880894]
                 dbms/src/Storages/StorageDeltaMerge.cpp:910
       0x7916b6d DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*)::$_0::operator()(bool) const [tiflash+126970733]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:123
       0x7912f32 DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*) [tiflash+126955314]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:169
       0x7912c9c DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool) [tiflash+126954652]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:348
       0x79351ff DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&) [tiflash+127095295]
                 dbms/src/Storages/Transaction/Region.cpp:712
       0x7901910 DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) [tiflash+126884112]
                 dbms/src/Storages/Transaction/KVStore.cpp:285
       0x791b822 HandleWriteRaftCmd [tiflash+126990370]
                 dbms/src/Storages/Transaction/ProxyFFI.cpp:92
  0x7f4277f366e0 raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::h4c8bba241d5ee4d4 [libtiflash_proxy.so+28567264]
  0x7f4277f33d6c raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h478c4bad834ddec7 [libtiflash_proxy.so+28556652]
  0x7f4277f53f91 raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::he4c0b1137ae7cad9 [libtiflash_proxy.so+28688273]
  0x7f4277f56bfe raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::h5b138e5889c69e66 [libtiflash_proxy.so+28699646]
  0x7f4277f59510 _$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::he20c24833a37424f [libtiflash_proxy.so+28710160]
  0x7f4277bcc609 batch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::hec57674a22fdcc30 [libtiflash_proxy.so+24987145]
  0x7f4278275cd8 std::sys_common::backtrace::__rust_begin_short_backtrace::h189c0b2868fa537c [libtiflash_proxy.so+31972568]
  0x7f4277aadec1 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h7c53021c905641b6 [libtiflash_proxy.so+23813825]
  0x7f427848bfca std::sys::unix::thread::Thread::new::thread_start::hd39c5f08bdcda277 [libtiflash_proxy.so+34160586]
  0x7f4275ad5ea5 start_thread [libpthread.so.0+32421]
  0x7f42755e89fd __clone [libc.so.6+1042941]"] [thread_id=187]
[2022/06/24 05:51:41.033 +08:00] [ERROR] [Exception.cpp:85] ["Application:Storage inited fail, [table_id=15259]: Code: 49, e.displayText() = DB::Exception: PageFile binary version not match {MergingReader of PageFile_29_0, type: Formal, sequence no: 0, meta offset: 2095127, data offset: 35667226} [unknown_version=0] [file=/data01/deploy/data/data/t_15259/log/page_29_0/meta], e.what() = DB::Exception, Stack trace:


       0x1d272d3 StackTrace::StackTrace() [tiflash+30569171]
                 dbms/src/Common/StackTrace.cpp:23
       0x1d248d6 DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
                 dbms/src/Common/Exception.h:41
       0x79f53a3 DB::PS::V2::PageFile::MetaMergingReader::moveNext(unsigned int*) [tiflash+127882147]
                 dbms/src/Storages/Page/V2/PageFile.cpp:585
       0x7a06501 DB::PS::V2::PageStorage::restore() [tiflash+127952129]
                 dbms/src/Storages/Page/V2/PageStorage.cpp:289
       0x781ebf9 DB::DM::StoragePool::restore() [tiflash+125955065]
                 dbms/src/Storages/DeltaMerge/StoragePool.cpp:377
       0x7788ca1 DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, long, std::__1::vector<DB::DM::ColumnDefine, std::__1::allocator<DB::DM::ColumnDefine> > const&, DB::DM::ColumnDefine const&, bool, unsigned long, DB::DM::DeltaMergeStore::Settings const&) [tiflash+125340833]
                 dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:243
       0x771283d DB::StorageDeltaMerge::getAndMaybeInitStore() [tiflash+124856381]
                 dbms/src/Storages/StorageDeltaMerge.cpp:1545
       0x771fa2a DB::StorageDeltaMerge::initStoreIfDataDirExist() [tiflash+124910122]
                 dbms/src/Storages/StorageDeltaMerge.cpp:1577
       0x1d564cb DB::initStores(DB::Context&, Poco::Logger*, bool)::$_14::operator()() const [tiflash+30762187]
                 dbms/src/Server/Server.cpp:487
       0x1d5e5f1 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DB::initStores(DB::Context&, Poco::Logger*, bool)::$_14> >(void*) [tiflash+30795249]
                 /usr/local/bin/../include/c++/v1/thread:291
  0x7fd6b3d07ea5 start_thread [libpthread.so.0+32421]
  0x7fd6b381a9fd __clone [libc.so.6+1042941]"] [thread_id=34]
[2022/06/24 05:51:41.033 +08:00] [ERROR] [Exception.cpp:85] ["DB::EngineStoreApplyRes DB::HandleWriteRaftCmd(const DB::EngineStoreServerWrap *, DB::WriteCmdsView, DB::RaftCmdHeader):Code: 49, e.displayText() = DB::Exception: PageFile binary version not match {MergingReader of PageFile_7_0, type: Formal, sequence no: 0, meta offset: 25098, data offset: 9991564} [unknown_version=0] [file=/data01/deploy/data/data/t_15303/meta/page_7_0/meta], e.what() = DB::Exception, Stack trace:


       0x1d272d3 StackTrace::StackTrace() [tiflash+30569171]
                 dbms/src/Common/StackTrace.cpp:23
       0x1d248d6 DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+30558422]
                 dbms/src/Common/Exception.h:41
       0x79f53a3 DB::PS::V2::PageFile::MetaMergingReader::moveNext(unsigned int*) [tiflash+127882147]
                 dbms/src/Storages/Page/V2/PageFile.cpp:585
       0x7a06501 DB::PS::V2::PageStorage::restore() [tiflash+127952129]
                 dbms/src/Storages/Page/V2/PageStorage.cpp:289
       0x781ec0f DB::DM::StoragePool::restore() [tiflash+125955087]
                 dbms/src/Storages/DeltaMerge/StoragePool.cpp:379
       0x7788ca1 DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, long, std::__1::vector<DB::DM::ColumnDefine, std::__1::allocator<DB::DM::ColumnDefine> > const&, DB::DM::ColumnDefine const&, bool, unsigned long, DB::DM::DeltaMergeStore::Settings const&) [tiflash+125340833]
                 dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp:243
       0x771283d DB::StorageDeltaMerge::getAndMaybeInitStore() [tiflash+124856381]
                 dbms/src/Storages/StorageDeltaMerge.cpp:1545
       0x77187fe DB::StorageDeltaMerge::getSchemaSnapshotAndBlockForDecoding(DB::TableDoubleLockHolder<false> const&, bool) [tiflash+124880894]
                 dbms/src/Storages/StorageDeltaMerge.cpp:910
       0x7916b6d DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*)::$_0::operator()(bool) const [tiflash+126970733]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:123
       0x7912f32 DB::writeRegionDataToStorage(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*) [tiflash+126955314]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:169
       0x7912c9c DB::RegionTable::writeBlockByRegion(DB::Context&, DB::RegionPtrWithBlock const&, std::__1::vector<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> >, std::__1::allocator<std::__1::tuple<DB::RawTiDBPK, unsigned char, unsigned long, std::__1::shared_ptr<DB::StringObject<false> const> > > >&, Poco::Logger*, bool) [tiflash+126954652]
                 dbms/src/Storages/Transaction/PartitionStreams.cpp:348
       0x79351ff DB::Region::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, DB::TMTContext&) [tiflash+127095295]
                 dbms/src/Storages/Transaction/Region.cpp:712
       0x7901910 DB::KVStore::handleWriteRaftCmd(DB::WriteCmdsView const&, unsigned long, unsigned long, unsigned long, DB::TMTContext&) [tiflash+126884112]
                 dbms/src/Storages/Transaction/KVStore.cpp:285
       0x791b822 HandleWriteRaftCmd [tiflash+126990370]
                 dbms/src/Storages/Transaction/ProxyFFI.cpp:92
  0x7fd6b61686e0 raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::process_raft_cmd::h4c8bba241d5ee4d4 [libtiflash_proxy.so+28567264]
  0x7fd6b6165d6c raftstore::store::fsm::apply::ApplyDelegate$LT$EK$GT$::handle_raft_committed_entries::h478c4bad834ddec7 [libtiflash_proxy.so+28556652]
  0x7fd6b6185f91 raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_apply::he4c0b1137ae7cad9 [libtiflash_proxy.so+28688273]
  0x7fd6b6188bfe raftstore::store::fsm::apply::ApplyFsm$LT$EK$GT$::handle_tasks::h5b138e5889c69e66 [libtiflash_proxy.so+28699646]
  0x7fd6b618b510 _$LT$raftstore..store..fsm..apply..ApplyPoller$LT$EK$GT$$u20$as$u20$batch_system..batch..PollHandler$LT$raftstore..store..fsm..apply..ApplyFsm$LT$EK$GT$$C$raftstore..store..fsm..apply..ControlFsm$GT$$GT$::handle_normal::he20c24833a37424f [libtiflash_proxy.so+28710160]
  0x7fd6b5dfe609 batch_system::batch::Poller$LT$N$C$C$C$Handler$GT$::poll::hec57674a22fdcc30 [libtiflash_proxy.so+24987145]
  0x7fd6b64a7cd8 std::sys_common::backtrace::__rust_begin_short_backtrace::h189c0b2868fa537c [libtiflash_proxy.so+31972568]
  0x7fd6b5cdfec1 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h7c53021c905641b6 [libtiflash_proxy.so+23813825]
  0x7fd6b66bdfca std::sys::unix::thread::Thread::new::thread_start::hd39c5f08bdcda277 [libtiflash_proxy.so+34160586]
  0x7fd6b3d07ea5 start_thread [libpthread.so.0+32421]
  0x7fd6b381a9fd __clone [libc.so.6+1042941]"] [thread_id=158]

overview 的监控要再发一下,刚那个没数据。
现在问题是要解决 tiflash异常挂,重启 的问题吗?

SQ-cluster-Overview_2022-06-24T11_26_37.037Z.json (9.0 MB)

是的,我们想要搞清楚TiFlash老是出问题的原因,现在担心它随时会挂掉。

顺便问一下关于TiFlash的配置的问题: 1. 我看这些TiFlash的配置的值都比较保守,我们没有设置过这些参数,这些参数是系统自动计算配置的吗?有什么可以优化的吗? 2. 为什么TiFlash的很多配置名都是带着proxy的,这个proxy是什么意思?这些proxy相关的配置是TiFlash核心的配置吗?

tiflash	192.168.14.25:3930	raftstore-proxy.readpool.unified.max-thread-count 4
tiflash	192.168.14.28:3930	raftstore-proxy.memory-usage-limit 29471976106

mpp开起来了没

tidb_allow_mpp = on
tidb_enforce_mpp = off

@qihuajun 想了解下这些信息

  1. 具体是从什么版本升级到 6.1.0 的,以及升级 6.1.0 具体发生在什么时间点?
  2. 混布 tikv 和 tiflash 的 6 台机器 192.168.14.23~28,机器配置是怎么样的?
  3. “把分区动态裁剪关掉,积压的查询kill掉之后TiFlash的查询恢复正常” 根据这个描述,判断可能是使用了 6.1.0 的分区动态裁剪,但是没有手动更新统计信息,导致到 tiflash 上的执行计划有变更,进而引起了 OOM 的情况。

请问是在升级 6.1.0 之后,才开启分区动态裁剪的么?开启之后,有没有手动更新过统计信息呢?
https://docs.pingcap.com/zh/tidb/stable/partitioned-table#动态裁剪模式


另外我从监控看到 tikv 常驻使用内存在 110GB,tiflash 进程内存大概达到 30 GB 左右就会发生重启。如果机器配置是 150G 左右内存,在 tikv 和 tiflash 混布的情况下,110GB 都分配给了 tikv,留给 tiflash 的可用内存太少了,这样在机器资源划分上其实不太合理。AP 查询在处理复杂查询时,会和混布的 tikv 争抢 CPU、内存、IO等资源,造成 TP 业务受影响。一般建议将 tikv 和 tiflash 分开在不同的实例部署。

请问发生这个6月24日报错之前,tiflash 有因为 OOM 而发生过重启么?
现在这个节点还是处于一直重启报错的状态么?

1.从5.4.1升级到6.1.0的
2. 机器的配置CPU 64核,内存 320G,硬盘是NVME SSD,通过NUMA资源一半分配给了TiKV、一半分配给了TiFlash。
3. 是升级 6.1.0 之后才开启动态裁剪的,我们手动更新过两张表的统计信息,但后来整个TiFlash集群查询变慢之后那两张表也查不出来。

内存的话一共320G,TiKV和TiFlash各分配了一半,但是TiFlash最高只使用了30多G。

之前没有发生过,这个节点我们强制下线,把数据清掉又重新上线的,到目前为止正常。

从你提供的这个 6月24的日志来看,该实例磁盘上的数据已经损坏了,在启动后检测到,就报错退出。
没有现场也没有前后日志,具体导致磁盘数据损坏的原因不好判定。如果再出现的话,可以保留一下数据和日志,看能不能再分析。

内存一共 320G,但是 tiflash 最高只用 30多G,看起来比较奇怪。可以确认下你是如何配置 NUMA 资源分配的吗?

image
image

就是在配置示例的时候指定了numa_node

请看我上一条回复。

能否分别确认下开启和关闭动态裁剪的情况下,涉及到分区表的 SQL, 其查询计划的改变。比如是否有 join 顺序不合理,或者执行 explain 后具体有怎么样的 warnings 发生

set @@session.tidb_partition_prune_mode = 'static';
explain <your sql>;
set @@session.tidb_partition_prune_mode = 'dynamic';
explain <your sql>;