【 TiDB 使用环境】生产环境 /测试/
【 TiDB 版本】 6.5.0
【复现路径】
【遇到的问题:问题现象及影响】在中控上启动集群时,前面的TiDB 、PD、Tikv 这些组件可以启动,在启动TiFlash时报错如下:Error: failed to start tiflash: failed to start: 10.14.2.19 tiflash-9000.service, please check the instance’s log(/data/tidb/tidb_program/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s 但是我去TiFlash 实例的 log目录中去检查,只发现了tiflash_stderr.log文件,日志文件中没有内容,同时在TiFlash的系统日志中有如下信息: Feb 16 10:03:37 localhost kernel: traps: tiflash[6594] trap invalid opcode ip:70f61f7 sp:7ffdc1740668 error:0 in tiflash[1504000+645b000]
Feb 16 10:03:37 localhost systemd: tiflash-9000.service: main process exited, code=killed, status=4/ILL
Feb 16 10:03:37 localhost systemd: Unit tiflash-9000.service entered failed state.
Feb 16 10:03:37 localhost systemd: tiflash-9000.service failed.
Feb 16 10:03:52 localhost systemd: tiflash-9000.service holdoff time over, scheduling restart.
Feb 16 10:03:52 localhost systemd: Stopped tiflash service.
Feb 16 10:03:52 localhost systemd: Started tiflash service.
Feb 16 10:03:53 localhost bash: sync …
Feb 16 10:03:53 localhost bash: real#0110m0.072s
Feb 16 10:03:53 localhost bash: user#0110m0.000s
Feb 16 10:03:53 localhost bash: sys#0110m0.007s
Feb 16 10:03:53 localhost bash: ok
启动报错后,在中控的日志如下( /root/.tiup/logs/tiup-cluster-debug-2023-02-16-11-07-26.log.):
2023-02-16T11:07:25.291+0800 INFO CheckPoint {“host”: “10.14.2.19”, “port”: 22, “user”: “root”, “sudo”: false, “cmd”: “ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLI STEN 0 128 [::]:22 [::]: \nLISTEN 0 100 [::1]:25 [::]:* \n”, “stderr”: “”, “hash”: “7223ed50460785 a2adf666d511a257aa03110294”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(CheckPointExecutor).Execute", “hit”: false}
2023-02-16T11:07:26.434+0800 INFO SSHCommand {“host”: “10.14.2.19”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin ss -ltn”, “stdout”: "State Recv-Q Send-Q L ocal Address:Port Peer Address:Port \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLISTEN 0 128 [::]:22 [::]: \nLISTEN 0 100 [::1]:25 [::]: \n”, “stderr”: “”}
2023-02-16T11:07:26.434+0800 INFO CheckPoint {“host”: “10.14.2.19”, “port”: 22, “user”: “root”, “sudo”: false, “cmd”: “ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLI STEN 0 128 [::]:22 [::]: \nLISTEN 0 100 [::1]:25 [::]:* \n”, “stderr”: “”, “hash”: “7223ed50460785 a2adf666d511a257aa03110294”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2023-02-16T11:07:26.434+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2023-02-16T11:07:26.435+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start tiflash: failed to start: 10.14.2.19 tiflash-9000.service, please check the instance’s log(/data/ tidb/tidb_program/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s\ngithub.com/pingc ap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec /instance.go:119\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:805\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\ n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang.or g/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to start: 10.14.2.19 tiflash-9000.se rvice, please check the instance’s log(/data/tidb/tidb_program/tiflash-9000/log) for more detail.\nfailed to start tiflash”}
2023-02-16T11:07:26.435+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start tiflash: failed to start: 10.14.2.19 tiflash-9000.service, please check the instance’s log(/da ta/tidb/tidb_program/tiflash-9000/log) for more detail.: timed out waiting for port 9000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 9000 to be started after 2m0s\ngithub.com/pi ngcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/s pec/instance.go:119\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*TiFlashInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/tiflash.go:805\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstan ce\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang .org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to start: 10.14.2.19 tiflash-9000 .service, please check the instance’s log(/data/tidb/tidb_program/tiflash-9000/log) for more detail.\nfailed to start tiflash”}
【资源配置】
【附件:截图/日志/监控】