grafana状态为down 启动失败

【 TiDB 使用环境】生产环境
【 TiDB 版本】5.3.0
【遇到的问题】


【复现路径】
【问题现象及影响】
执行命令tiup cluster start tidb-zj -N ip:3000
报错
Error: failed to start grafana: failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2022-05-10-10-57-37.log.

【附件】

请提供各个组件的 version 信息,如 cdc/tikv,可通过执行 cdc version/tikv-server --version 获取。

日志:

2022-05-10T10:19:12.029+0800 INFO CheckPoint {“host”: “192.168.16.203”, “port”: 22, “user”: “tidb”, “sudo”: false, “cmd”: “ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \ LISTEN 0 128 :22 : \ LISTEN 0 100 127.0.0.1:25 : \ LISTEN 0 100 :18686 : \ LISTEN 0 128 192.168.16.203:9093 : \ LISTEN 0 128 192.168.16.203:9094 : \ LISTEN 0 128 127.0.0.1:1234 : \ LISTEN 0 128 [::]:22 [::]: \ LISTEN 0 100 [::1]:25 [::]: \ LISTEN 0 128 [::]:9115 [::]:* \ LISTEN 0 128 [::]:9090 [::]:* \ LISTEN 0 128 [::]:40324 [::]:* \ LISTEN 0 128 [::]:6123 [::]:* \ LISTEN 0 128 [::]:9100 [::]:* \ LISTEN 0 128 [::]:36429 [::]:* \ LISTEN 0 128 [::]:8081 [::]:* \ ”, “stderr”: “”, “hash”: “d90f3eb81e145621c3880c320b88bf10c6f03a70”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2022-05-10T10:19:12.030+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2022-05-10T10:19:12.030+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start grafana: failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 3000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116\ github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:148\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:373\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:502\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1581\ failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.\ failed to start grafana”}
2022-05-10T10:19:12.030+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start grafana: failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 3000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116\ github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:148\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:373\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:502\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1581\ failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.\ failed to start grafana”}

机器的性能怎么样 是不是内存不足导致服务启动慢或者oom?

感觉内存 cpu 还够用。

/data/tidb-deploy/grafana-3000/log

这下边的log中有什么信息么

空的,没文件

之前是up状态的, 不清楚怎么编程down的

可以看看是否有端口冲突

/var/log/message 有没有什么有用的信息:face_with_monocle:

netstat -nlp | grep 3000

返回是空的, 没有占用这个3000端口的。

磁盘是否满了?

image 磁盘没满。

May 11 08:51:31 localhost systemd: grafana-3000.service holdoff time over, scheduling restart.
May 11 08:51:31 localhost systemd: Stopped grafana service.
May 11 08:51:31 localhost systemd: Started grafana service.
May 11 08:51:31 localhost systemd: Failed at step EXEC spawning /data/tidb-deploy/grafana-3000/scripts/run_grafana.sh: No such file or directory
May 11 08:51:31 localhost systemd: grafana-3000.service: main process exited, code=exited, status=203/EXEC
May 11 08:51:31 localhost systemd: Unit grafana-3000.service entered failed state.
May 11 08:51:31 localhost systemd: grafana-3000.service failed.

May 11 08:51:31 localhost systemd: Failed at step EXEC spawning /data/tidb-deploy/grafana-3000/scripts/run_grafana.sh: No such file or directory

/data/tidb-deploy/grafana-3000/scripts/run_grafana.sh
这个文件没有么?

是没有。不知道什么原因丢失了?

实在不行可以试试重新部署一个grafana

grafana_servers:

  • host: 10.0.1.22
    port: 3000
    deploy_dir: /data/tidb-deploy/grafana-3000

按照你的配置改改

[tidb@jian tidb-deploy]$ cat topologygrafana.yaml
grafana_servers:

  • host: 192.168.135.148
    port: 3000
    deploy_dir: /tidb-deploy/grafana-3000
    [tidb@jian tidb-deploy]$ tiup cluster scale-out tidb-jiantest ./topologygrafana.yaml

这样应该就可以,之前可能要先把现在的grafana删掉

按照你配置的信息,grafana扩容不成功的