DM重启失败

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【DM 版本】v2.0.2

【问题描述】 tiup dm restart dm-test 失败

[tidb@localhost prometheus-9090]$ tiup dm restart dm-test
Starting component dm: /home/tidb/.tiup/components/dm/v1.4.2/tiup-dm restart dm-test

  • [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/dm/clusters/dm-test/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/dm/clusters/dm-test/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=10.200.25.254
  • [Parallel] - UserSSH: user=tidb, host=10.200.25.254
  • [Parallel] - UserSSH: user=tidb, host=10.200.25.254
  • [Parallel] - UserSSH: user=tidb, host=10.200.25.254
  • [ Serial ] - RestartCluster
    Stopping component alertmanager
    Stopping instance 10.200.25.254
    Stop alertmanager 10.200.25.254:9093 success
    Stopping component grafana
    Stopping instance 10.200.25.254
    Stop grafana 10.200.25.254:3000 success
    Stopping component prometheus
    Stopping instance 10.200.25.254
    Stop prometheus 10.200.25.254:9090 success
    Stopping component dm-master
    Stopping instance 10.200.25.254
    Stop dm-master 10.200.25.254:8261 success
    Starting component dm-master
    Starting instance dm-master 10.200.25.254:8261
    Start dm-master 10.200.25.254:8261 success
    Starting component prometheus
    Starting instance prometheus 10.200.25.254:9090

Error: failed to start: failed to start prometheus: failed to start: prometheus 10.200.25.254:9090, please check the instance’s log(/home/tidb/dm/deploy/prometheus-9090/log) for more detail.: timed out waiting for port 9090 to be started after 1m0s

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-dm-debug-2021-04-26-17-04-52.log.
Error: run /home/tidb/.tiup/components/dm/v1.4.2/tiup-dm (wd:/home/tidb/.tiup/data/SVg28aO) failed: exit status 1
[tidb@localhost prometheus-9090]$
[tidb@localhost prometheus-9090]$
[tidb@localhost prometheus-9090]$ tiup dm display dm-test
Starting component dm: /home/tidb/.tiup/components/dm/v1.4.2/tiup-dm display dm-test
Cluster type: dm
Cluster name: dm-test
Cluster version: v2.0.2
SSH type: builtin
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


10.200.25.254:9093 alertmanager 10.200.25.254 9093/9094 linux/x86_64 inactive /home/tidb/dm/data/alertmanager-9093 /home/tidb/dm/deploy/alertmanager-9093
10.200.25.254:8261 dm-master 10.200.25.254 8261/8291 linux/x86_64 Healthy|L /home/tidb/dm/data/dm-master-8261 /home/tidb/dm/deploy/dm-master-8261
10.200.25.254:3000 grafana 10.200.25.254 3000 linux/x86_64 inactive - /home/tidb/dm/deploy/grafana-3000
10.200.25.254:9090 prometheus 10.200.25.254 9090 linux/x86_64 activating /home/tidb/dm/data/prometheus-9090 /home/tidb/dm/deploy/prometheus-9090
Total nodes: 4

": “”, “hash”: “68edc581eb12860f150abf89b72c26497d056578”, “func”: “github.com/pingcap/tiup/pkg/cluster/executor.(*CheckPointExecutor).Execute”, “hit”: false}
2021-04-26T17:04:52.627+0800 DEBUG retry error: operation timed out after 1m0s
2021-04-26T17:04:52.627+0800 DEBUG TaskFinish {“task”: “RestartCluster”, “error”: “failed to start: failed to start prometheus: failed to start: prometheus 10.200.25.254:9090, please check the instance’s log(/home/tidb/dm/deploy/prometheus-9090/log) for more detail.: timed out waiting for port 9090 to be started after 1m0s”, “errorVerbose”: “timed out waiting for port 9090 to be started after 1m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:114\ github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:145\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:363\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:484\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1371\ failed to start: prometheus 10.200.25.254:9090, please check the instance’s log(/home/tidb/dm/deploy/prometheus-9090/log) for more detail.\ failed to start prometheus\ failed to start”}
2021-04-26T17:04:52.627+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start: failed to start prometheus: failed to start: prometheus 10.200.25.254:9090, please check the instance’s log(/home/tidb/dm/deploy/prometheus-9090/log) for more detail.: timed out waiting for port 9090 to be started after 1m0s”, “errorVerbose”: “timed out waiting for port 9090 to be started after 1m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\ \tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:114\ github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\ \tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:145\ github.com/pingcap/tiup/pkg/cluster/operation.startInstance\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:363\ github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\ \tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:484\ golang.org/x/sync/errgroup.(*Group).Go.func1\ \tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\ runtime.goexit\ \truntime/asm_amd64.s:1371\ failed to start: prometheus 10.200.25.254:9090, please check the instance’s log(/home/tidb/dm/deploy/prometheus-9090/log) for more detail.\ failed to start prometheus\ failed to start”}

level=error ts=2021-04-26T09:07:24.939325274Z caller=main.go:717 err=“opening storage failed: block dir: “/home/tidb/dm/data/prometheus-9090/01F357C7ASCA8XAJH6AGG44TTX”: unexpected end of JSON input”
level=warn ts=2021-04-26T09:07:40.187744129Z caller=main.go:274 deprecation_notice="‘storage.tsdb.retention’ flag is deprecated use ‘storage.tsdb.retention.time’ instead."
level=info ts=2021-04-26T09:07:40.187849099Z caller=main.go:321 msg=“Starting Prometheus” version="(version=2.8.1, branch=HEAD, revision=4d60eb36dcbed725fcac5b27018574118f12fffb)"
level=info ts=2021-04-26T09:07:40.187878426Z caller=main.go:322 build_context="(go=go1.11.6, user=root@bfdd6a22a683, date=20190328-18:04:08)"
level=info ts=2021-04-26T09:07:40.187902662Z caller=main.go:323 host_details="(Linux 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 localhost.localdomain (none))"
level=info ts=2021-04-26T09:07:40.187930675Z caller=main.go:324 fd_limits="(soft=1000000, hard=1000000)"
level=info ts=2021-04-26T09:07:40.18795116Z caller=main.go:325 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-04-26T09:07:40.189193738Z caller=main.go:640 msg=“Starting TSDB …”
level=info ts=2021-04-26T09:07:40.189549335Z caller=repair.go:48 component=tsdb msg=“found healthy block” mint=1617328800000 maxt=1617408000000 ulid=01F2B1MB53Y10SZHE7WJGJ90Z5
level=info ts=2021-04-26T09:07:40.189644513Z caller=repair.go:48 component=tsdb msg=“found healthy block” mint=1617408000000 maxt=1617602400000 ulid=01F2GM57RHE31F3N1T0BZ20ZFB
level=info ts=2021-04-26T09:07:40.189714758Z caller=repair.go:48 component=tsdb msg=“found healthy block” mint=1617602400000 maxt=1617796800000 ulid=01F2PDHVHRKMDTZCTTXBE9HSH9
level=info ts=2021-04-26T09:07:40.189782743Z caller=repair.go:48 component=tsdb msg=“found healthy block” mint=1617796800000 maxt=1617991200000 ulid=01F2W6YEQE4GDJSZ6JZNZB2BF5
level=info ts=2021-04-26T09:07:40.189848441Z caller=repair.go:48 component=tsdb msg=“found healthy block” mint=1617991200000 maxt=1618185600000 ulid=01F320B2MEBZNBN8JN16TN4WG1
level=info ts=2021-04-26T09:07:40.189903099Z caller=repair.go:48 component=tsdb msg=“found healthy block” mint=1618185600000 maxt=1618250400000 ulid=01F33Y4KGT8262QGE06YMEMXPY
level=info ts=2021-04-26T09:07:40.189954596Z caller=repair.go:48 component=tsdb msg=“found healthy block” mint=1618250400000 maxt=1618272000000 ulid=01F34JQS6B8JRB6TMQD1CYS1C1
level=info ts=2021-04-26T09:07:40.18999751Z caller=main.go:509 msg=“Stopping scrape discovery manager…”
level=info ts=2021-04-26T09:07:40.190011974Z caller=main.go:523 msg=“Stopping notify discovery manager…”
level=info ts=2021-04-26T09:07:40.19003814Z caller=main.go:545 msg=“Stopping scrape manager…”
level=info ts=2021-04-26T09:07:40.190045706Z caller=main.go:519 msg=“Notify discovery manager stopped”
level=info ts=2021-04-26T09:07:40.190078768Z caller=web.go:418 component=web msg=“Start listening for connections” address=:9090
level=info ts=2021-04-26T09:07:40.190638611Z caller=main.go:505 msg=“Scrape discovery manager stopped”
level=info ts=2021-04-26T09:07:40.190881135Z caller=manager.go:736 component=“rule manager” msg=“Stopping rule manager…”
level=info ts=2021-04-26T09:07:40.190899342Z caller=manager.go:742 component=“rule manager” msg=“Rule manager stopped”
level=info ts=2021-04-26T09:07:40.190914869Z caller=notifier.go:521 component=notifier msg=“Stopping notification manager…”
level=info ts=2021-04-26T09:07:40.190927649Z caller=main.go:708 msg=“Notifier manager stopped”
level=info ts=2021-04-26T09:07:40.190944455Z caller=main.go:539 msg=“Scrape manager stopped”
level=error ts=2021-04-26T09:07:40.191078892Z caller=main.go:717 err=“opening storage failed: block dir: “/home/tidb/dm/data/prometheus-9090/01F357C7ASCA8XAJH6AGG44TTX”: unexpected end of JSON input”

tiup 哪个版本,看这是 prometheus 没起来,检查下日志呢。

最细版本的,测试环境,最后我重新安装同步了。

:ok_hand: