升级 TIUP 后 ,重启时报nodeexporter 出错

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.0

  • 【问题描述】:之前从ansible 3.0成功升级为tiup 1.1.1方式。在stop服务后,首次重新开启服务(tiup cluster start test-cluster),遇到超时错误:

    Starting component node_exporter
    	Starting instance 10.12.5.113
    
    
    retry error: operation timed out after 2m0s
    	10.12.5.113 failed to start: timed out waiting for port 9100 to be started after 2m0s
    
    Error: 	10.12.5.113 failed to start: timed out waiting for port 9100 to be started after 2m0s: timed out waiting for port 9100 to be started after 2m0s
    

检查113下关于nodeexporter的文件,发现是正常的:

请问,如何解决?

你好,

  1. 请提供下 tiup cluster display cluster-name 反馈结果。
  2. 请确认下 10.12.5.113 服务器上,node_exporter server 是否已经启动,辛苦上传下 ps 截图。

如果确认 server 不存在尝试手动拉起下 systemctl start,看服务是否可以启动,否则看下 log/ 中,welcome 字样之后的日志,可以贴出来看下。

113服务器上当前node_exporter server 未启动

尝试手动拉起服务,但看起来未成功:

查看pd日志,9/17为关闭服务时间,9/20为第一次尝试开启服务但失败的情况。详细日志如下所示:

[2020/09/17 10:15:45.224 +00:00] [INFO] [manager.go:74] ["exit dashboard loop"]

[2020/09/17 10:15:45.224 +00:00] [INFO] [server.go:427] ["close server"]

[2020/09/20 16:26:21.166 +00:00] [INFO] [util.go:50] ["Welcome to Placement Driver (PD)"]

[2020/09/20 16:26:21.166 +00:00] [INFO] [util.go:51] [PD] [release-version=v4.0.0]

[2020/09/20 16:26:21.166 +00:00] [INFO] [util.go:52] [PD] [edition=Community]

[2020/09/20 16:26:21.167 +00:00] [INFO] [util.go:53] [PD] [git-hash=56d4c3d2237f5bf6fb11a794731ed1d95c8020c2]

[2020/09/20 16:26:21.167 +00:00] [INFO] [util.go:54] [PD] [git-branch=heads/refs/tags/v4.0.0]

[2020/09/20 16:26:21.167 +00:00] [INFO] [util.go:55] [PD] [utc-build-time="2020-05-28 01:39:35"]

[2020/09/20 16:26:21.167 +00:00] [WARN] [main.go:83] ["Config contains undefined item: namespace-classifier"]

[2020/09/20 16:26:21.167 +00:00] [INFO] [metricutil.go:81] ["disable Prometheus push client"]

[2020/09/20 16:26:21.167 +00:00] [INFO] [server.go:209] ["PD Config"] [config="{\"client-urls\":\"http://0.0.0.0:2379\",\"peer-urls\":\"http://10.12.5.113:2380\",\"advertise-client-urls\":\"http://10.12.5.113:2379\",\"advertise-peer-urls\":\"http://10.12.5.113:2380\",\"name\":\"pd_pd1\",\"data-dir\":\"/home/tidb/deploy/data.pd\",\"force-new-cluster\":false,\"enable-grpc-gateway\":true,\"initial-cluster\":\"pd_pd1=http://10.12.5.113:2380,pd_pd2=http://10.12.5.114:2380,pd_pd3=http://10.12.5.115:2380\",\"initial-cluster-state\":\"new\",\"join\":\"\",\"lease\":3,\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":false,\"file\":{\"filename\":\"/home/tidb/deploy/log/pd.log\",\"max-size\":300,\"max-days\":0,\"max-backups\":0},\"development\":false,\"disable-caller\":false,\"disable-stacktrace\":false,\"disable-error-verbose\":true,\"sampling\":null},\"tso-save-interval\":\"3s\",\"metric\":{\"job\":\"pd_pd1\",\"address\":\"\",\"interval\":\"15s\"},\"schedule\":{\"max-snapshot-count\":3,\"max-pending-peer-count\":16,\"max-merge-region-size\":20,\"max-merge-region-keys\":200000,\"split-merge-interval\":\"1h0m0s\",\"enable-one-way-merge\":\"false\",\"enable-cross-table-merge\":\"false\",\"patrol-region-interval\":\"100ms\",\"max-store-down-time\":\"30m0s\",\"leader-schedule-limit\":4,\"leader-schedule-policy\":\"count\",\"region-schedule-limit\":2048,\"replica-schedule-limit\":64,\"merge-schedule-limit\":8,\"hot-region-schedule-limit\":4,\"hot-region-cache-hits-threshold\":3,\"store-balance-rate\":15,\"tolerant-size-ratio\":5,\"low-space-ratio\":0.8,\"high-space-ratio\":0.7,\"scheduler-max-waiting-operator\":5,\"enable-remove-down-replica\":\"true\",\"enable-replace-offline-replica\":\"true\",\"enable-make-up-replica\":\"true\",\"enable-remove-extra-replica\":\"true\",\"enable-location-replacement\":\"true\",\"enable-debug-metrics\":\"false\",\"schedulers-v2\":[{\"type\":\"balance-region\",\"args\":null,\"disable\":false,\"args-payload\":\"\"},{\"type\":\"balance-leader\",\"args\":null,\"disable\":false,\"args-payload\":\"\"},{\"type\":\"hot-region\",\"args\":null,\"disable\":false,\"args-payload\":\"\"},{\"type\":\"label\",\"args\":null,\"disable\":false,\"args-payload\":\"\"}],\"schedulers-payload\":null,\"store-limit-mode\":\"manual\"},\"replication\":{\"max-replicas\":3,\"location-labels\":\"host\",\"strictly-match-label\":\"false\",\"enable-placement-rules\":\"false\"},\"pd-server\":{\"use-region-storage\":\"true\",\"max-gap-reset-ts\":\"24h0m0s\",\"key-type\":\"table\",\"runtime-services\":\"\",\"metric-storage\":\"http://10.12.5.232:9090\",\"dashboard-address\":\"auto\"},\"cluster-version\":\"0.0.0\",\"quota-backend-bytes\":\"8GiB\",\"auto-compaction-mode\":\"periodic\",\"auto-compaction-retention-v2\":\"1h\",\"TickInterval\":\"500ms\",\"ElectionInterval\":\"3s\",\"PreVote\":true,\"security\":{\"cacert-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"cert-allowed-cn\":null},\"label-property\":null,\"WarningMsgs\":[\"Config contains undefined item: namespace-classifier\"],\"DisableStrictReconfigCheck\":false,\"HeartbeatStreamBindInterval\":\"1m0s\",\"LeaderPriorityCheckInterval\":\"1m0s\",\"dashboard\":{\"tidb_cacert_path\":\"\",\"tidb_cert_path\":\"\",\"tidb_key_path\":\"\",\"public_path_prefix\":\"/dashboard\"},\"replication-mode\":{\"replication-mode\":\"majority\",\"dr-auto-sync\":{\"label-key\":\"\",\"primary\":\"\",\"dr\":\"\",\"primary-replicas\":0,\"dr-replicas\":0,\"wait-store-timeout\":\"1m0s\",\"wait-sync-timeout\":\"1m0s\"}}}"]

[2020/09/20 16:26:21.169 +00:00] [INFO] [server.go:182] ["register REST path"] [path=/pd/api/v1]

[2020/09/20 16:26:21.169 +00:00] [INFO] [server.go:182] ["register REST path"] [path=/swagger/]

[2020/09/20 16:26:21.170 +00:00] [INFO] [server.go:182] ["register REST path"] [path=/dashboard/api/]

[2020/09/20 16:26:21.170 +00:00] [INFO] [server.go:182] ["register REST path"] [path=/dashboard/]

[2020/09/20 16:26:21.170 +00:00] [INFO] [etcd.go:117] ["configuring peer listeners"] [listen-peer-urls="[http://10.12.5.113:2380]"]

[2020/09/20 16:26:21.170 +00:00] [INFO] [systime_mon.go:26] ["start system time monitor"]

[2020/09/20 16:26:21.170 +00:00] [INFO] [etcd.go:127] ["configuring client listeners"] [listen-client-urls="[http://0.0.0.0:2379]"]

[2020/09/20 16:26:21.170 +00:00] [INFO] [etcd.go:602] ["pprof is enabled"] [path=/debug/pprof]

[2020/09/20 16:26:21.170 +00:00] [INFO] [etcd.go:299] ["starting an etcd server"] [etcd-version=3.4.3] [git-sha="Not provided (use ./build instead of go build)"] [go-version=go1.13] [go-os=linux] [go-arch=amd64] [max-cpu-set=8] [max-cpu-available=8] [member-initialized=true] [name=pd_pd1] [data-dir=/home/tidb/deploy/data.pd] [wal-dir=] [wal-dir-dedicated=] [member-dir=/home/tidb/deploy/data.pd/member] [force-new-cluster=false] [heartbeat-interval=500ms] [election-timeout=3s] [initial-election-tick-advance=true] [snapshot-count=100000] [snapshot-catchup-entries=5000] [initial-advertise-peer-urls="[http://10.12.5.113:2380]"] [listen-peer-urls="[http://10.12.5.113:2380]"] [advertise-client-urls="[http://10.12.5.113:2379]"] [listen-client-urls="[http://0.0.0.0:2379]"] [listen-metrics-urls="[]"] [cors="[*]"] [host-whitelist="[*]"] [initial-cluster=] [initial-cluster-state=new] [initial-cluster-token=] [quota-size-bytes=8589934592] [pre-vote=true] [initial-corrupt-check=false] [corrupt-check-time-interval=0s] [auto-compaction-mode=periodic] [auto-compaction-retention=1h0m0s] [auto-compaction-interval=1h0m0s] [discovery-url=] [discovery-proxy=]

[2020/09/20 16:26:21.357 +00:00] [INFO] [backend.go:79] ["opened backend db"] [path=/home/tidb/deploy/data.pd/member/snap/db] [took=174.262383ms]

[2020/09/20 16:26:21.369 +00:00] [INFO] [server.go:443] ["recovered v2 store from snapshot"] [snapshot-index=6415564] [snapshot-size="65 kB"]

[2020/09/20 16:26:21.370 +00:00] [INFO] [kvstore.go:378] ["restored last compact revision"] [meta-bucket-name=meta] [meta-bucket-name-key=finishedCompactRev] [restored-compact-revision=6441824]

[2020/09/20 16:26:21.377 +00:00] [INFO] [server.go:461] ["recovered v3 backend from snapshot"] [backend-size-bytes=2428928] [backend-size="2.4 MB"] [backend-size-in-use-bytes=671744] [backend-size-in-use="672 kB"]

[2020/09/20 16:26:21.659 +00:00] [INFO] [raft.go:506] ["restarting local member"] [cluster-id=ef78fb08584c0e28] [local-member-id=411b7e01f694ab25] [commit-index=6482876]

[2020/09/20 16:26:21.667 +00:00] [INFO] [raft.go:1530] ["411b7e01f694ab25 switched to configuration voters=(2579653654541892389 3717199249823848643 4691481983733508901)"]

[2020/09/20 16:26:21.667 +00:00] [INFO] [raft.go:700] ["411b7e01f694ab25 became follower at term 604"]

[2020/09/20 16:26:21.668 +00:00] [INFO] [raft.go:383] ["newRaft 411b7e01f694ab25 [peers: [23ccc554ca81fb25,33962505ed1958c3,411b7e01f694ab25], term: 604, commit: 6482876, applied: 6415564, lastindex: 6482876, lastterm: 604]"]

[2020/09/20 16:26:21.668 +00:00] [INFO] [capability.go:76] ["enabled capabilities for version"] [cluster-version=3.4]

[2020/09/20 16:26:21.668 +00:00] [INFO] [cluster.go:256] ["recovered/added member from store"] [cluster-id=ef78fb08584c0e28] [local-member-id=411b7e01f694ab25] [recovered-remote-peer-id=23ccc554ca81fb25] [recovered-remote-peer-urls="[http://10.12.5.115:2380]"]

[2020/09/20 16:26:21.668 +00:00] [INFO] [cluster.go:256] ["recovered/added member from store"] [cluster-id=ef78fb08584c0e28] [local-member-id=411b7e01f694ab25] [recovered-remote-peer-id=33962505ed1958c3] [recovered-remote-peer-urls="[http://10.12.5.114:2380]"]

[2020/09/20 16:26:21.668 +00:00] [INFO] [cluster.go:256] ["recovered/added member from store"] [cluster-id=ef78fb08584c0e28] [local-member-id=411b7e01f694ab25] [recovered-remote-peer-id=411b7e01f694ab25] [recovered-remote-peer-urls="[http://10.12.5.113:2380]"]

[2020/09/20 16:26:21.668 +00:00] [INFO] [cluster.go:269] ["set cluster version from store"] [cluster-version=3.4]

[2020/09/20 16:26:21.668 +00:00] [INFO] [kvstore.go:378] ["restored last compact revision"] [meta-bucket-name=meta] [meta-bucket-name-key=finishedCompactRev] [restored-compact-revision=6441824]

[2020/09/20 16:26:21.672 +00:00] [WARN] [store.go:1317] ["simple token is not cryptographically signed"]

[2020/09/20 16:26:21.672 +00:00] [INFO] [quota.go:126] ["enabled backend quota"] [quota-name=v3-applier] [quota-size-bytes=8589934592] [quota-size="8.6 GB"]

[2020/09/20 16:26:21.672 +00:00] [INFO] [peer.go:128] ["starting remote peer"] [remote-peer-id=23ccc554ca81fb25]

[2020/09/20 16:26:21.672 +00:00] [INFO] [pipeline.go:71] ["started HTTP pipelining with remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=23ccc554ca81fb25]

[2020/09/20 16:26:21.672 +00:00] [INFO] [stream.go:166] ["started stream writer with remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=23ccc554ca81fb25]

[2020/09/20 16:26:21.673 +00:00] [INFO] [stream.go:166] ["started stream writer with remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=23ccc554ca81fb25]

[2020/09/20 16:26:21.673 +00:00] [INFO] [peer.go:134] ["started remote peer"] [remote-peer-id=23ccc554ca81fb25]

[2020/09/20 16:26:21.673 +00:00] [INFO] [transport.go:327] ["added remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=23ccc554ca81fb25] [remote-peer-urls="[http://10.12.5.115:2380]"]

[2020/09/20 16:26:21.673 +00:00] [INFO] [stream.go:406] ["started stream reader with remote peer"] [stream-reader-type="stream MsgApp v2"] [local-member-id=411b7e01f694ab25] [remote-peer-id=23ccc554ca81fb25]

[2020/09/20 16:26:21.673 +00:00] [INFO] [stream.go:406] ["started stream reader with remote peer"] [stream-reader-type="stream Message"] [local-member-id=411b7e01f694ab25] [remote-peer-id=23ccc554ca81fb25]

[2020/09/20 16:26:21.673 +00:00] [INFO] [peer.go:128] ["starting remote peer"] [remote-peer-id=33962505ed1958c3]

[2020/09/20 16:26:21.673 +00:00] [INFO] [pipeline.go:71] ["started HTTP pipelining with remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=33962505ed1958c3]

[2020/09/20 16:26:21.674 +00:00] [INFO] [stream.go:166] ["started stream writer with remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=33962505ed1958c3]

[2020/09/20 16:26:21.674 +00:00] [INFO] [stream.go:166] ["started stream writer with remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=33962505ed1958c3]

[2020/09/20 16:26:21.674 +00:00] [INFO] [peer.go:134] ["started remote peer"] [remote-peer-id=33962505ed1958c3]

[2020/09/20 16:26:21.674 +00:00] [INFO] [stream.go:406] ["started stream reader with remote peer"] [stream-reader-type="stream MsgApp v2"] [local-member-id=411b7e01f694ab25] [remote-peer-id=33962505ed1958c3]

[2020/09/20 16:26:21.674 +00:00] [INFO] [transport.go:327] ["added remote peer"] [local-member-id=411b7e01f694ab25] [remote-peer-id=33962505ed1958c3] [remote-peer-urls="[http://10.12.5.114:2380]"]

[2020/09/20 16:26:21.675 +00:00] [INFO] [server.go:779] ["starting etcd server"] [local-member-id=411b7e01f694ab25] [local-server-version=3.4.3] [cluster-id=ef78fb08584c0e28] [cluster-version=3.4]

[2020/09/20 16:26:21.675 +00:00] [INFO] [server.go:680] ["starting initial election tick advance"] [election-ticks=6]

看 反馈的 log 只有 0920 的日志?
dmesg -T |grep node_exporter 看下结果。
执行 ./scripts/run_node_exporter.sh 看下 log 中是否有今日的日志

检查 ./scripts/run_node_exporter.sh发现node_exporter在bin下的路径与脚本不符,进行了修改。修改后执行执行 ./scripts/run_node_exporter.sh ,可以在ps中查看到node_exporter运行。

重新使用tiup启动服务,发现此时已通过当前错误的检测,故批量修改pd的node_exporter和black_exporter脚本中路径。现已成功启动。

总结:
应该是从ansible --> tiup迁移升级后,bin路径下node_exporter和black_exporter 的位置与deploy/script/run_xxx.sh中的路径不符造成。手动修改即可。

修改前后的 sh 脚本可以看下吗?

在启动过程中发现下述问题:
retry error: operation timed out after 2m0s
10.12.5.215 failed to start: timed out waiting for port 9100 to be started after 2m0s

Error: 10.12.5.215 failed to start: timed out waiting for port 9100 to be started after 2m0s: timed out waiting for port 9100 to be started after 2m0s

修改前:

修改后:

后来发现215是通过tiup的方式扩容的tikv,所以bin/node_exporter/node_exporter才是其可执行文件的路径。建议在文档中提及 ansible方式和tiup方式的可执行文件路径可能需要修改。

非常抱歉对你的使用造成不便,我们调整一下迁移文档的 Trouble Shooting 部分。

1 Like

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。