grafana状态为down 启动失败

TiDB_New_People · 2022 年5 月 10 日 03:00

【 TiDB 使用环境】生产环境
【 TiDB 版本】5.3.0
【遇到的问题】

【复现路径】
【问题现象及影响】
执行命令tiup cluster start tidb-zj -N ip:3000
报错
Error: failed to start grafana: failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2022-05-10-10-57-37.log.

【附件】

请提供各个组件的 version 信息，如 cdc/tikv，可通过执行 cdc version/tikv-server --version 获取。

TiDB_New_People · 2022 年5 月 10 日 03:06

日志：

2022-05-10T10:19:12.029+0800	INFO	CheckPoint	{“host”: “192.168.16.203”, “port”: 22, “user”: “tidb”, “sudo”: false, “cmd”: “ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \ LISTEN 0 128 :22 : \ LISTEN 0 100 127.0.0.1:25 : \ LISTEN 0 100 :18686 : \ LISTEN 0 128 192.168.16.203:9093 : \ LISTEN 0 128 192.168.16.203:9094 : \ LISTEN 0 128 127.0.0.1:1234 : \ LISTEN 0 128 [::]:22 [::]: \ LISTEN 0 100 [::1]:25 [::]: \ LISTEN 0 128 [::]:9115 [::]:* \ LISTEN 0 128 [::]:9090 [::]:* \ LISTEN 0 128 [::]:40324 [::]:* \ LISTEN 0 128 [::]:6123 [::]:* \ LISTEN 0 128 [::]:9100 [::]:* \ LISTEN 0 128 [::]:36429 [::]:* \ LISTEN 0 128 [::]:8081 [::]:* \ ”, “stderr”: “”, “hash”: “d90f3eb81e145621c3880c320b88bf10c6f03a70”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2022-05-10T10:19:12.030+0800	DEBUG	retry error	{“error”: “operation timed out after 2m0s”}
2022-05-10T10:19:12.030+0800	DEBUG	TaskFinish	{“task”: “StartCluster”, “error”: “failed to start grafana: failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 3000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(WaitFor).Execute\ \tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116\ github.com/pingcap/tiup/pkg/cluster/spec.(BaseInstance).Ready\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:148\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:373\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:502\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1581\ failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.\ failed to start grafana”}
2022-05-10T10:19:12.030+0800	INFO	Execute command finished	{“code”: 1, “error”: “failed to start grafana: failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s”, “errorVerbose”: “timed out waiting for port 3000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(WaitFor).Execute\ \tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:116\ github.com/pingcap/tiup/pkg/cluster/spec.(BaseInstance).Ready\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:148\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:373\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:502\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1581\ failed to start: 192.168.16.203 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.\ failed to start grafana”}

banana_jian · 2022 年5 月 10 日 03:18

机器的性能怎么样是不是内存不足导致服务启动慢或者oom？

TiDB_New_People · 2022 年5 月 10 日 03:31

感觉内存 cpu 还够用。

banana_jian · 2022 年5 月 10 日 03:54

/data/tidb-deploy/grafana-3000/log

这下边的log中有什么信息么

TiDB_New_People · 2022 年5 月 10 日 03:56

空的，没文件

TiDB_New_People · 2022 年5 月 10 日 03:56

之前是up状态的，不清楚怎么编程down的

啦啦啦啦啦 · 2022 年5 月 10 日 04:04

可以看看是否有端口冲突

banana_jian · 2022 年5 月 10 日 05:00

/var/log/message 有没有什么有用的信息

TiDB_New_People · 2022 年5 月 10 日 08:56

netstat -nlp | grep 3000

TiDB_New_People · 2022 年5 月 10 日 08:56

返回是空的，没有占用这个3000端口的。

songxuecheng · 2022 年5 月 10 日 09:04

磁盘是否满了？

TiDB_New_People · 2022 年5 月 10 日 09:05

磁盘没满。

TiDB_New_People · 2022 年5 月 11 日 00:54

May 11 08:51:31 localhost systemd: grafana-3000.service holdoff time over, scheduling restart.
May 11 08:51:31 localhost systemd: Stopped grafana service.
May 11 08:51:31 localhost systemd: Started grafana service.
May 11 08:51:31 localhost systemd: Failed at step EXEC spawning /data/tidb-deploy/grafana-3000/scripts/run_grafana.sh: No such file or directory
May 11 08:51:31 localhost systemd: grafana-3000.service: main process exited, code=exited, status=203/EXEC
May 11 08:51:31 localhost systemd: Unit grafana-3000.service entered failed state.
May 11 08:51:31 localhost systemd: grafana-3000.service failed.

banana_jian · 2022 年5 月 11 日 01:51

May 11 08:51:31 localhost systemd: Failed at step EXEC spawning /data/tidb-deploy/grafana-3000/scripts/run_grafana.sh: No such file or directory

/data/tidb-deploy/grafana-3000/scripts/run_grafana.sh
这个文件没有么？

TiDB_New_People · 2022 年5 月 11 日 02:18

是没有。不知道什么原因丢失了？

banana_jian · 2022 年5 月 11 日 02:22

实在不行可以试试重新部署一个grafana

banana_jian · 2022 年5 月 11 日 02:24

grafana_servers:

host: 10.0.1.22
port: 3000
deploy_dir: /data/tidb-deploy/grafana-3000

按照你的配置改改

banana_jian · 2022 年5 月 11 日 02:28

[tidb@jian tidb-deploy]$ cat topologygrafana.yaml
grafana_servers:

host: 192.168.135.148
port: 3000
deploy_dir: /tidb-deploy/grafana-3000
[tidb@jian tidb-deploy]$ tiup cluster scale-out tidb-jiantest ./topologygrafana.yaml

这样应该就可以，之前可能要先把现在的grafana删掉

xiaoxiaozuofang · 2022 年11 月 19 日 09:28

按照你配置的信息，grafana扩容不成功的