Error: failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance's log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s

【 TiDB 使用环境`】 测试环境
【 TiDB 版本】v6.1.0
【遇到的问题】tiup 部署,tiflash启动失败
【复现路径】tiup cluster start
【问题现象及影响】

tiflash_error.log发出来看看

检查下端口、防火墙

/tidb-deploy/tiflash-9000这个文件下生成的文件直线上升,把我100G硬盘都占慢了,后来我就把这个文件夹下所有文件删除了,现在启动后不往里面写东西了,请问这个文件夹里的文件内容大师怎么回事,还有怎样设置才能继续往这里写日志,tiflash_error.log找不到了

在原来的目录下 建一个空的tiflash_error.log 应该就可以了,

d6634ee9e75d26", “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2022-08-22T09:58:19.649+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2022-08-22T09:58:19.649+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:803\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:404\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:533\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\truntime/asm_amd64.s:1571\nfailed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.\nfailed to start tiflash”}
2022-08-22T09:58:19.649+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start tiflash: failed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:803\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:404\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:533\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\truntime/asm_amd64.s:1571\nfailed to start: 182.92.101.109 tiflash-9000.service, please check the instance’s log(/tidb-deploy/tiflash-9000/log) for more detail.\nfailed to start tiflash”}

/tidb-deploy/tiflash-9000所有文件删除了,tiflash节点删除并不掉,怎么解决

tiflash_error.log (92.2 KB)
这是报错文件,请各位大神看看,为什么tiflash启动失败

image
tiflash部署目录文件越来越来多,而且自己不会删掉,一会硬盘就满了,请问有谁知道怎么回事吗?需要设置什么参数让他自己删掉没有用的文件

从错误信息来看,TiFlash 启动时发生了崩溃:

[2022/08/22 14:53:58.426 +08:00] [ERROR] [BaseDaemon.cpp:420] ["BaseDaemon:Attempted access has violated the permissions assigned to the memory area."] [thread_id=5]
[2022/08/22 14:53:59.947 +08:00] [ERROR] [BaseDaemon.cpp:570] ["BaseDaemon:\n       0x1ed2661\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+32319073]\n                \tlibs/libdaemon/src/BaseDaemon.cpp:221\n  0x7f2cbae9f5d0\t<unknown symbol> [libpthread.so.0+62928]\n       0x85d11e0\tgrpc_server_request_registered_call [tiflash+140317152]\n                \tcontrib/grpc/src/core/lib/surface/server.cc:0\n       0x855fbb6\tgrpc::ServerInterface::RegisteredAsyncRequest::IssueRequest(void*, grpc_byte_buffer**, grpc_impl::ServerCompletionQueue*) [tiflash+139852726]\n                \tcontrib/grpc/src/cpp/server/server_cc.cc:209\n       0x7ac6f6f\tgrpc::ServerInterface::PayloadAsyncRequest<mpp::EstablishMPPConnectionRequest>::PayloadAsyncRequest(grpc::internal::RpcServiceMethod*, grpc::ServerInterface*, grpc_impl::ServerContext*, grpc::internal::ServerAsyncStreamingInterface*, grpc_impl::CompletionQueue*, grpc_impl::ServerCompletionQueue*, void*, mpp::EstablishMPPConnectionRequest*) [tiflash+128741231]\n                \tcontrib/grpc/include/grpcpp/impl/codegen/server_interface.h:270\n       0x7ac5a40\tDB::EstablishCallData::EstablishCallData(DB::AsyncFlashService*, grpc_impl::ServerCompletionQueue*, grpc_impl::ServerCompletionQueue*, std::__1::shared_ptr<std::__1::atomic<bool> > const&) [tiflash+128735808]\n                \tdbms/src/Flash/EstablishCall.cpp:34\n       0x7ac5d3b\tDB::EstablishCallData::spawn(DB::AsyncFlashService*, grpc_impl::ServerCompletionQueue*, grpc_impl::ServerCompletionQueue*, std::__1::shared_ptr<std::__1::atomic<bool> > const&) [tiflash+128736571]\n                \tdbms/src/Flash/EstablishCall.cpp:44\n       0x1d638c5\tDB::Server::FlashGrpcServerHolder::FlashGrpcServerHolder(DB::Server&, DB::TiFlashRaftConfig const&, Poco::Logger*) [tiflash+30816453]\n                \tdbms/src/Server/Server.cpp:643\n       0x1d5ab9e\tDB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) [tiflash+30780318]\n                \tdbms/src/Server/Server.cpp:1401\n       0x7fe644a\tPoco::Util::Application::run() [tiflash+134112330]\n                \tcontrib/poco/Util/src/Application.cpp:335\n       0x7ff5c0c\tPoco::Util::ServerApplication::run(int, char**) [tiflash+134175756]\n                \tcontrib/poco/Util/src/ServerApplication.cpp:618\n       0x1d5e4ad\tmainEntryClickHouseServer(int, char**) [tiflash+30794925]\n                \tdbms/src/Server/Server.cpp:1549\n       0x1d1061e\tmain [tiflash+30475806]\n                \tdbms/src/Server/main.cpp:167\n  0x7f2cba8cf495\t__libc_start_main [libc.so.6+140437]"] [thread_id=5]

能否提供一下你的硬件架构及操作系统信息?

Distributor ID: CentOS
Description: CentOS Linux release 7.6.1810 (Core)
Release: 7.6.1810
Codename: Core
Architecture: x86-64
cpu是4盒8g,是不是配置低?测试环境最低配置是什么配置?
还有一个问题就是tiflash启动失败,一直往它的部署文件目录写文件是怎么回事,一会儿就把我100G硬盘写满了

临时办法可以不让 TiFlash 生成 core dump 文件

TiFlash 每次启动都会崩溃,每次崩溃产生了一个 core dump 文件,你可以搜一下 CentOS 如何全局关闭 core dump 那么就不会写满你的磁盘了。

tiflash 里有类似于 abort-on-panic 的参数么?我看官档里没写,
https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file#abort-on-panic

tiflash crash 会自动生成 core dump,现在没有参数能从 tiflash 控制不生成么?

core dump 是否生成主要是操作系统控制的(ulimit),TiFlash 总是会 abort on panic

那这个参数怎么设置呢?

ulimit -c 0 具体可以搜一下 ulimit / core dump

额外再问下,tiflash 有计划增加参数控制 abort on panic 么?和 tikv 类似,在 os 层控制的基础上,在 tiflash 层也可以控制是否产生 core

可以提个 issue https://github.com/pingcap/tiflash/issues/new/choose

raise one issue to record, https://github.com/pingcap/tiflash/issues/5946