在启动tidb集群里,tiflash的 9000端口启动失败

【 TiDB 使用环境】生产环境 /测试/
【 TiDB 版本】 6.5.0
【复现路径】
【遇到的问题:问题现象及影响】在中控上启动集群时,前面的TiDB 、PD、Tikv 这些组件可以启动,在启动TiFlash时报错如下:Error: failed to start tiflash: failed to start: 10.14.2.19 tiflash-9000.service, please check the instance’s log(/data/tidb/tidb_program/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s 但是我去TiFlash 实例的 log目录中去检查,只发现了tiflash_stderr.log文件,日志文件中没有内容,同时在TiFlash的系统日志中有如下信息: Feb 16 10:03:37 localhost kernel: traps: tiflash[6594] trap invalid opcode ip:70f61f7 sp:7ffdc1740668 error:0 in tiflash[1504000+645b000]
Feb 16 10:03:37 localhost systemd: tiflash-9000.service: main process exited, code=killed, status=4/ILL
Feb 16 10:03:37 localhost systemd: Unit tiflash-9000.service entered failed state.
Feb 16 10:03:37 localhost systemd: tiflash-9000.service failed.
Feb 16 10:03:52 localhost systemd: tiflash-9000.service holdoff time over, scheduling restart.
Feb 16 10:03:52 localhost systemd: Stopped tiflash service.
Feb 16 10:03:52 localhost systemd: Started tiflash service.
Feb 16 10:03:53 localhost bash: sync …
Feb 16 10:03:53 localhost bash: real#0110m0.072s
Feb 16 10:03:53 localhost bash: user#0110m0.000s
Feb 16 10:03:53 localhost bash: sys#0110m0.007s
Feb 16 10:03:53 localhost bash: ok

启动报错后,在中控的日志如下( /root/.tiup/logs/tiup-cluster-debug-2023-02-16-11-07-26.log.):

2023-02-16T11:07:25.291+0800 INFO CheckPoint {“host”: “10.14.2.19”, “port”: 22, “user”: “root”, “sudo”: false, “cmd”: “ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLI STEN 0 128 [::]:22 [::]: \nLISTEN 0 100 [::1]:25 [::]:* \n”, “stderr”: “”, “hash”: “7223ed50460785 a2adf666d511a257aa03110294”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(CheckPointExecutor).Execute", “hit”: false}
2023-02-16T11:07:26.434+0800 INFO SSHCommand {“host”: “10.14.2.19”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin ss -ltn”, “stdout”: "State Recv-Q Send-Q L ocal Address:Port Peer Address:Port \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLISTEN 0 128 [::]:22 [::]: \nLISTEN 0 100 [::1]:25 [::]:
\n”, “stderr”: “”}
2023-02-16T11:07:26.434+0800 INFO CheckPoint {“host”: “10.14.2.19”, “port”: 22, “user”: “root”, “sudo”: false, “cmd”: “ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLI STEN 0 128 [::]:22 [::]: \nLISTEN 0 100 [::1]:25 [::]:* \n”, “stderr”: “”, “hash”: “7223ed50460785 a2adf666d511a257aa03110294”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2023-02-16T11:07:26.434+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2023-02-16T11:07:26.435+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start tiflash: failed to start: 10.14.2.19 tiflash-9000.service, please check the instance’s log(/data/ tidb/tidb_program/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s\ngithub.com/pingc ap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec /instance.go:119\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:805\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\ n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang.or g/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to start: 10.14.2.19 tiflash-9000.se rvice, please check the instance’s log(/data/tidb/tidb_program/tiflash-9000/log) for more detail.\nfailed to start tiflash”}
2023-02-16T11:07:26.435+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start tiflash: failed to start: 10.14.2.19 tiflash-9000.service, please check the instance’s log(/da ta/tidb/tidb_program/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s\ngithub.com/pi ngcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/s pec/instance.go:119\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:805\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstan ce\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang .org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to start: 10.14.2.19 tiflash-9000 .service, please check the instance’s log(/data/tidb/tidb_program/tiflash-9000/log) for more detail.\nfailed to start tiflash”}

【资源配置】
【附件:截图/日志/监控】
aaaa

bbbb

安装检测时通过了吗,需要检查一下yml配置文件

检查是通过的。。。

please check the instance’s log(/data/tidb/tidb_program/tiflash-9000/log) for more detail.
先去这个地方看下 tiflash 日志

检查过了,在这个目录里有一个日志文件,但是这个日志文件是个空文件。

如果没有报错信息就是启动时间太长,超过2分钟了,等一会就可以了。
今年刚遇到这个问题,不过我的是tidb节点启动特别慢,启动用了大概5分钟。
当时也是一脸懵逼,看log没报错,display里就是没显示,start就提示节点启动失败。

timed out waiting for port 9000 to be started after 2m0s

我理解的这个报错信息就是,去看节点log,节点log没有error就是节点还在启动,只是超过2分钟了,tiup提示超时,但节点还在启动中。

tilfash 怎么开了那么多端口,是不同的实例么?

看起来就是 tiflash 一直没启动起来 :frowning: 机器配置怎么样呢

问题解决了吗,我今天部署6.5也遇到了相同问题。

可参考 tidb 6.5.0部署tiflash失败 排查问题

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。