prometheus,grafana ,alertmanager都无法通过tiup启动,但是可以手动启动

【 TiDB 使用环境】测试
【 TiDB 版本】6.5.8
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
[tidb@tidb-server system]$ tiup cluster start tidb-test -N 192.168.116.110:3000

A new version of cluster is available: v1.15.0 → v1.15.2

To update this component:   tiup update cluster
To update all components:   tiup update --all

Starting cluster tidb-test…

  • [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.110
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.111
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.110
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.110
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.110
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.112
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.110
  • [Parallel] - UserSSH: user=tidb, host=192.168.116.113
  • [ Serial ] - StartCluster
    Starting component grafana
    Starting instance 192.168.116.110:3000
    Failed to start grafana-3000.service: Unit not found.

Error: failed to start grafana: failed to start: 192.168.116.110 grafana-3000.service, please check the instance’s log(/tidb-deploy/grafana-3000/log) for more detail.: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@192.168.116.110:22’ {ssh_stderr: Failed to start grafana-3000.service: Unit not found.
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c “systemctl daemon-reload && systemctl start grafana-3000.service”}, cause: Process exited with status 5

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2024-06-20-11-29-05.log.
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
2024-06-20T11:29:05.479+0800 INFO Starting instance 192.168.116.110:3000
2024-06-20T11:29:05.590+0800 ERROR SSHCommand {“host”: “192.168.116.110”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl start grafana-3000.service"”, “error”: “Process exited with status 5”, “stdout”: “”, “stderr”: “Failed to start grafana-3000.service: Unit not found.\n”}
2024-06-20T11:29:05.590+0800 ERROR CheckPoint {“host”: “192.168.116.110”, “port”: 22, “user”: “tidb”, “sudo”: true, “cmd”: “systemctl daemon-reload && systemctl start grafana-3000.service”, “stdout”: “”, “stderr”: “Failed to start grafana-3000.service: Unit not found.\n”, “error”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@192.168.116.110:22’ {ssh_stderr: Failed to start grafana-3000.service: Unit not found.\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl start grafana-3000.service"}, cause: Process exited with status 5”, “errorVerbose”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@192.168.116.110:22’ {ssh_stderr: Failed to start grafana-3000.service: Unit not found.\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl start grafana-3000.service"}, cause: Process exited with status 5\n at github.com/pingcap/tiup/pkg/cluster/executor.(*EasySSHExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/ssh.go:174\n at github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/checkpoint.go:86\n at github.com/pingcap/tiup/pkg/cluster/module.(*SystemdModule).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/module/systemd.go:106\n at github.com/pingcap/tiup/pkg/cluster/operation.systemctl()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:424\n at github.com/pingcap/tiup/pkg/cluster/operation.startInstance()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:400\n at github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:535\n at golang.org/x/sync/errgroup.(*Group).Go.func1()\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n at runtime.goexit()\n\truntime/asm_amd64.s:1650”, “hash”: “48f15f405450faf7d57136e629285724a0713cde”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2024-06-20T11:29:05.590+0800 ERROR Failed to start grafana-3000.service: Unit not found.

2024-06-20T11:29:05.590+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start grafana: failed to start: 192.168.116.110 grafana-3000.service, please check the instance’s log(/tidb-deploy/grafana-3000/log) for more detail.: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@192.168.116.110:22’ {ssh_stderr: Failed to start grafana-3000.service: Unit not found.\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl start grafana-3000.service"}, cause: Process exited with status 5”, “errorVerbose”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@192.168.116.110:22’ {ssh_stderr: Failed to start grafana-3000.service: Unit not found.\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl start grafana-3000.service"}, cause: Process exited with status 5\n at github.com/pingcap/tiup/pkg/cluster/executor.(*EasySSHExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/ssh.go:174\n at github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/checkpoint.go:86\n at github.com/pingcap/tiup/pkg/cluster/module.(*SystemdModule).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/module/systemd.go:106\n at github.com/pingcap/tiup/pkg/cluster/operation.systemctl()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:424\n at github.com/pingcap/tiup/pkg/cluster/operation.startInstance()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:400\n at github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:535\n at golang.org/x/sync/errgroup.(*Group).Go.func1()\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n at runtime.goexit()\n\truntime/asm_amd64.s:1650\nfailed to start: 192.168.116.110 grafana-3000.service, please check the instance’s log(/tidb-deploy/grafana-3000/log) for more detail.\ngithub.com/pingcap/tiup/pkg/cluster/operation.toFailedActionError\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:645\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:401\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:535\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1650\nfailed to start grafana”}
2024-06-20T11:29:05.591+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start grafana: failed to start: 192.168.116.110 grafana-3000.service, please check the instance’s log(/tidb-deploy/grafana-3000/log) for more detail.: executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@192.168.116.110:22’ {ssh_stderr: Failed to start grafana-3000.service: Unit not found.\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl start grafana-3000.service"}, cause: Process exited with status 5”, “errorVerbose”: “executor.ssh.execute_failed: Failed to execute command over SSH for ‘tidb@192.168.116.110:22’ {ssh_stderr: Failed to start grafana-3000.service: Unit not found.\n, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin; /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl start grafana-3000.service"}, cause: Process exited with status 5\n at github.com/pingcap/tiup/pkg/cluster/executor.(*EasySSHExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/ssh.go:174\n at github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/executor/checkpoint.go:86\n at github.com/pingcap/tiup/pkg/cluster/module.(*SystemdModule).Execute()\n\tgithub.com/pingcap/tiup/pkg/cluster/module/systemd.go:106\n at github.com/pingcap/tiup/pkg/cluster/operation.systemctl()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:424\n at github.com/pingcap/tiup/pkg/cluster/operation.startInstance()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:400\n at github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1()\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:535\n at golang.org/x/sync/errgroup.(*Group).Go.func1()\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n at runtime.goexit()\n\truntime/asm_amd64.s:1650\nfailed to start: 192.168.116.110 grafana-3000.service, please check the instance’s log(/tidb-deploy/grafana-3000/log) for more detail.\ngithub.com/pingcap/tiup/pkg/cluster/operation.toFailedActionError\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:645\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:401\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:535\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1650\nfailed to start grafana”}

错误信息 Failed to start grafana-3000.service: Unit not found. 表示 systemd 无法找到名为 grafana-3000.service 的服务单元文件

systemctl daemon-reload && systemctl start grafana-3000.service
在对应服务器上可以启动吗

不行
[root@tidb-server ~]# systemctl daemon-reload && systemctl start grafana-3000.service
Failed to start grafana-3000.service: Unit not found.

集群是刚部署的吗

看报错是你的systemd文件出现问题了,实在不行,你就把数据文件备份下来,把现有节点缩容再扩容,之后把原来备份的数据文件覆盖到扩容后的目录

解决了,不知道怎么 prometheus,grafana ,alertmanager这些systemd文件都丢失了
在/etc/systemd/system下重新加入文件并授权就可以启动了
grafana-3000.service
[Unit]
Description=grafana service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
LimitSTACK=10485760
User=tidb
ExecStart=/bin/bash -c ‘/tidb-deploy/grafana-3000/scripts/run_grafana.sh’
Restart=always

RestartSec=15s

[Install]
WantedBy=multi-user.target

prometheus-9090.service

[Unit]
Description=prometheus service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
LimitSTACK=10485760
User=tidb
ExecStart=/bin/bash -c ‘/tidb-deploy/prometheus-9090/scripts/run_prometheus.sh’
ExecReload=/bin/bash -c ‘kill -HUP $MAINPID $(pidof /tidb-deploy/prometheus-9090/bin/ng-monitoring-server)’

Restart=always

RestartSec=15s

[Install]
WantedBy=multi-user.target

alertmanager-9093.service
[Unit]
Description=alertmanager service
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
LimitNOFILE=1000000
LimitSTACK=10485760
User=tidb
ExecStart=/bin/bash -c ‘/tidb-deploy/alertmanager-9093/scripts/run_alertmanager.sh’
Restart=always

RestartSec=15s

[Install]
WantedBy=multi-user.target

解决了就好,回忆一下做什么操作了,为什么这个会丢失,我看有系统不兼容部署出问题的

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。