升级tidb grafana组件失败，提示timeout超时

炼气期DBA · 2025 年6 月 10 日 05:01

在进行grafana组件升级，执行升级命令后，提示超时，升级失败，接着启动grafana提示超时，请问这是什么原因呢？

[tidb@1 ~]$ tiup cluster patch my-tidb /home/tidb/tidb-grafana.tar.gz -R grafana --overwrite
Checking updates for component cluster… Timedout (after 2s)
Will patch the cluster my-tidb with package path is /home/tidb/tidb-grafana.tar.gz, nodes: , roles: grafana.
Do you want to continue? [y/N]:(default=N) y

[ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/my-tidb/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/my-tidb/ssh/id_rsa.pub
[Parallel] - UserSSH: user=tidb, host=192.168.6.34
[Parallel] - UserSSH: user=tidb, host=192.168.6.31
[Parallel] - UserSSH: user=tidb, host=192.168.6.35
[Parallel] - UserSSH: user=tidb, host=192.168.6.34
[Parallel] - UserSSH: user=tidb, host=192.168.6.35
[Parallel] - UserSSH: user=tidb, host=192.168.6.31
[Parallel] - UserSSH: user=tidb, host=192.168.6.35
[Parallel] - UserSSH: user=tidb, host=192.168.6.33
[Parallel] - UserSSH: user=tidb, host=192.168.6.31
[Parallel] - UserSSH: user=tidb, host=192.168.6.32
[Parallel] - UserSSH: user=tidb, host=192.168.6.32
[Parallel] - UserSSH: user=tidb, host=192.168.6.34
[Parallel] - UserSSH: user=tidb, host=192.168.6.32
[ Serial ] - BackupComponent: component=grafana, currentVersion=v7.5.1, remote=192.168.6.32:/data/tidb-deploy/grafana-3000
[ Serial ] - InstallPackage: srcPath=/home/tidb/tidb-grafana.tar.gz, remote=192.168.6.32:/data/tidb-deploy/grafana-3000
[ Serial ] - UpgradeCluster
Upgrading component grafana
Restarting instance 192.168.6.32:3000

Error: failed to restart: 192.168.6.32 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2025-06-10-12-55-03.log.

/data/tidb-deploy/grafana-3000/log:
日志都是这样的提示:
t=2025-06-10T11:51:15+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=47 name=“server report failures alert” changing state to=ok
t=2025-06-10T11:51:15+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=48 name=“TiKV channel full alert” changing state to=ok
t=2025-06-10T11:51:16+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=66 name=“TiKV Coprocessor CPU alert” changing state to=ok
t=2025-06-10T11:51:17+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=25 name=“approximate region size alert” changing state to=no_data
t=2025-06-10T11:51:17+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=26 name=“TiKV raft store CPU alert” changing state to=ok
t=2025-06-10T11:51:18+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=77 name=“Transaction Retry Num alert” changing state to=no_data
t=2025-06-10T11:51:18+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=27 name=“TiKV async apply CPU alert” changing state to=ok
t=2025-06-10T11:51:18+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=78 name=“Lock Resolve OPS alert” changing state to=no_data
t=2025-06-10T11:51:18+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=39 name=“TiKV gRPC poll CPU alert” changing state to=ok
t=2025-06-10T11:51:18+0800 lvl=info msg=“Alert Rule returned no data” logger=alerting.evalContext ruleId=40 name=“TiKV raft store CPU alert” changing state to=ok
t=2025-06-10T11:51:19+0800 lvl=info msg=“Shutdown started” logger=server reason=“System signal: terminated”

/home/tidb/.tiup/logs/tiup-cluster-debug-2025-06-10-12-55-03.log:
debug日志:

2025-06-10T12:55:03.001+0800	DEBUG	retry error	{error: operation timed out after 2m0s}
2025-06-10T12:55:03.001+0800	ERROR	CheckPoint	{instance: 192.168.6.32:3000, error: failed to restart: 192.168.6.32 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s, errorVerbose: timed out waiting for port 3000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:92\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:129\ngithub.com/pingcap/tiup/pkg/cluster/spec.(BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:167\ngithub.com/pingcap/tiup/pkg/cluster/operation.restartInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:366\ngithub.com/pingcap/tiup/pkg/cluster/operation.upgradeInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/upgrade.go:256\ngithub.com/pingcap/tiup/pkg/cluster/operation.Upgrade\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/upgrade.go:186\ngithub.com/pingcap/tiup/pkg/cluster/manager.(Manager).Patch.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/patch.go:108\ngithub.com/pingcap/tiup/pkg/cluster/task.(Func).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/func.go:34\ngithub.com/pingcap/tiup/pkg/cluster/task.(Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/manager.(Manager).Patch\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/patch.go:117\ngithub.com/pingcap/tiup/components/cluster/command.newPatchCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/patch.go:45\ngithub.com/spf13/cobra.(Command).execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(Command).Execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:297\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:267\nruntime.goexit\n\truntime/asm_arm64.s:1197\nfailed to restart: 192.168.6.32 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail., hash: 23f7b0f0f3a0ad29e90a7b12b1097e8d4074cb3e, func*: github.com/pingcap/tiup/pkg/cluster/operation.upgradeInstance, hit: false}
2025-06-10T12:55:03.001+0800	DEBUG	TaskFinish	{task: UpgradeCluster, error: failed to restart: 192.168.6.32 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s, errorVerbose: timed out waiting for port 3000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:92\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:129\ngithub.com/pingcap/tiup/pkg/cluster/spec.(BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:167\ngithub.com/pingcap/tiup/pkg/cluster/operation.restartInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:366\ngithub.com/pingcap/tiup/pkg/cluster/operation.upgradeInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/upgrade.go:256\ngithub.com/pingcap/tiup/pkg/cluster/operation.Upgrade\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/upgrade.go:186\ngithub.com/pingcap/tiup/pkg/cluster/manager.(Manager).Patch.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/patch.go:108\ngithub.com/pingcap/tiup/pkg/cluster/task.(Func).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/func.go:34\ngithub.com/pingcap/tiup/pkg/cluster/task.(Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/manager.(Manager).Patch\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/patch.go:117\ngithub.com/pingcap/tiup/components/cluster/command.newPatchCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/patch.go:45\ngithub.com/spf13/cobra.(Command).execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:297\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:267\nruntime.goexit\n\truntime/asm_arm64.s:1197\nfailed to restart: 192.168.6.32 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.}
2025-06-10T12:55:03.001+0800	INFO	Execute command finished	{code: 1, error: failed to restart: 192.168.6.32 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.: timed out waiting for port 3000 to be started after 2m0s, errorVerbose: timed out waiting for port 3000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:92\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:129\ngithub.com/pingcap/tiup/pkg/cluster/spec.(BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:167\ngithub.com/pingcap/tiup/pkg/cluster/operation.restartInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:366\ngithub.com/pingcap/tiup/pkg/cluster/operation.upgradeInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/upgrade.go:256\ngithub.com/pingcap/tiup/pkg/cluster/operation.Upgrade\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/upgrade.go:186\ngithub.com/pingcap/tiup/pkg/cluster/manager.(Manager).Patch.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/patch.go:108\ngithub.com/pingcap/tiup/pkg/cluster/task.(Func).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/func.go:34\ngithub.com/pingcap/tiup/pkg/cluster/task.(Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/manager.(Manager).Patch\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/patch.go:117\ngithub.com/pingcap/tiup/components/cluster/command.newPatchCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/patch.go:45\ngithub.com/spf13/cobra.(Command).execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(Command).ExecuteC\n\tgithub.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:297\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:267\nruntime.goexit\n\truntime/asm_arm64.s:1197\nfailed to restart: 192.168.6.32 grafana-3000.service, please check the instance’s log(/data/tidb-deploy/grafana-3000/log) for more detail.}

炼气期DBA · 2025 年6 月 10 日 06:43

我要哭了，为了安全，用FTP传来传去，最后发现不是二进制。。。。

炼气期DBA · 2025 年6 月 10 日 06:46

[tidb@2 bin]$ cat /proc/version
Linux version 4.19.90-23.8.v2101.ky10.aarch64 (KYLINSOFT@localhost.localdomain) (gcc version 7.3.0 (GCC)) #1 SMP Mon May 17 17:07:38 CST 2021

请问这个所谓的麒麟，不是下载x86_64位版本吗？

春风十里不如你 · 2025 年6 月 10 日 06:54

aarch64 是arm的版本，你是用amd版本的了吧

炼气期DBA · 2025 年6 月 10 日 06:59

感谢，我选错版本了，我下载了x86_64，真的是八仙版本，各显神通，还是秦始皇好啊。

system · 2025 年6 月 17 日 07:00

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。