TiUP扩容失败，有个node_exporter-9100.service不存在，

zhengjunbo · 2020 年5 月 14 日 08:21

检查该节点确实也不存在，命令执行不了，这是个3.0升级上来4.0.如何能再次部署上去呢

yilong · 2020 年5 月 14 日 08:30

先找一下其他节点的node_exporter.service ，目录/etc/systemd/system/下面，看看记录的类似如下 sh 文件在本机存在吗？如果存在，从其他节点copy一个service文件，再修改为本机的路径即可。
ExecStart=/home/tidb/lqh-clusters/root_test/deploy02/monitor-19100/scripts/run_node_exporter.sh
如果不存在，没有monitor-19100这样的目录，按照扩容文档，扩容一个node exporter

https://pingcap.com/docs-cn/stable/scale-tidb-using-tiup/

zhengjunbo · 2020 年5 月 14 日 08:36

不存在任何监控包在上面，应该是升级前这个节点以前也没装过监控。
通过扩展scale-out.yaml

monitored:
  node_exporter_port: 9100
  blackbox_exporter_port: 9115

依然升级不能指定部署上去，刚刚执行了一下还是报这个错

yilong · 2020 年5 月 14 日 08:47

麻烦发一下您的 tiup cluster display cluster-name 信息。
扩容成功了吗？在/etc/systemd/system下有service文件了吗？扩容有什么报错吗？
当前这个节点上有什么？ pd，tidb，tikv？

zhengjunbo · 2020 年5 月 14 日 08:49

扩容失败了，一样是报monitor deploy不过去

yilong · 2020 年5 月 14 日 08:53

当前48上的tikv时ansible导入的对吧。
麻烦在48上扩容一个 tidb吧，之后再缩容，就有监控信息了。
抱歉当前没有单独增加监控的功能，这个我们再和产品沟通下，多谢

zhengjunbo · 2020 年5 月 14 日 09:02

是导入的，刚刚也单独扩了下一个TiDB依然报监控弄不上去的错也尝试了TiKV的扩容，也是卡到这里了。

[tidb@sit2-tidb-mysql-v-l-01:/home/tidb]$tiup cluster scale-out test-cluster scale-out.yaml
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v0.6.0/cluster scale-out test-cluster scale-out.yaml
Please confirm your topology:
TiDB Cluster: test-cluster
TiDB Version: v4.0.0-rc.1
Type  Host          Ports       Directories
----  ----          -----       -----------
tidb  10.204.54.48  4000/10080  /data/TIDB1
Attention:
    1. If the topology is not what you expected, check your yaml file.
    2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]:  y
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa.pub


  - Download blackbox_exporter:v0.12.0 ... Done
+ [ Serial ] - UserSSH: user=tidb, host=10.204.54.48
+ [ Serial ] - Mkdir: host=10.204.54.48, directories='/data/TIDB1','','/data/TIDB1/log','/data/TIDB1/bin','/data/TIDB1/conf','/data/TIDB1/scripts'
+ [ Serial ] - CopyComponent: component=tidb, version=v4.0.0-rc.1, remote=10.204.54.48:/data/TIDB1
+ [ Serial ] - ScaleConfig: cluster=test-cluster, user=tidb, host=10.204.54.48, service=tidb-4000.service, deploy_dir=/data/TIDB1, data_dir=, log_dir=/data/TIDB1/log, cache_dir=
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.85
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.48
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.85
+ [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0}
Starting component pd
	Starting instance pd 10.204.54.78:2379
	Starting instance pd 10.204.54.86:2379
	Starting instance pd 10.204.54.85:2379
	Start pd 10.204.54.78:2379 success
	Start pd 10.204.54.85:2379 success
	Start pd 10.204.54.86:2379 success
Starting component node_exporter
	Starting instance 10.204.54.86
	Start 10.204.54.86 success
Starting component blackbox_exporter
	Starting instance 10.204.54.86
	Start 10.204.54.86 success
Starting component node_exporter
	Starting instance 10.204.54.85
	Start 10.204.54.85 success
Starting component blackbox_exporter
	Starting instance 10.204.54.85
	Start 10.204.54.85 success
Starting component node_exporter
	Starting instance 10.204.54.78
	Start 10.204.54.78 success
Starting component blackbox_exporter
	Starting instance 10.204.54.78
	Start 10.204.54.78 success
Starting component tikv
	Starting instance tikv 10.204.54.86:20160
	Starting instance tikv 10.204.54.48:20160
	Starting instance tikv 10.204.54.85:20160
	Start tikv 10.204.54.85:20160 success
	Start tikv 10.204.54.48:20160 success
	Start tikv 10.204.54.86:20160 success
Starting component node_exporter
	Starting instance 10.204.54.48
Failed to start node_exporter-9100.service: Unit not found.


Error: failed to start: failed to start: 10.204.54.48: executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.204.54.48:22' {ssh_stderr: Failed to start node_exporter-9100.service: Unit not found.
, ssh_stdout: , ssh_command: PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c "systemctl daemon-reload && systemctl start node_exporter-9100.service"}, cause: Process exited with status 5

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-05-14-16-59-36.log.
Error: run `/home/tidb/.tiup/components/cluster/v0.6.0/cluster` (wd:/home/tidb/.tiup/data/Ryx2fWa) failed: exit status 1

yilong · 2020 年5 月 14 日 11:26

稍等，我们再研究下。

zhengjunbo · 2020 年5 月 14 日 11:27

好的，没事，这是测试环境。晚点我重建下就好

yilong · 2020 年5 月 14 日 11:36

着急吗？要是可以的话，先别重建，我们要是解决不了，再辛苦重建。多谢

zhengjunbo · 2020 年5 月 14 日 11:39

好的，等你们回复

yilong · 2020 年5 月 14 日 12:29

请问下，您其他节点有这个监控信息吗？

zhengjunbo · 2020 年5 月 15 日 02:08

有的，这个单独没部署的节点应该是ansible的时候没放进monitor里面导致的。

yilong · 2020 年5 月 15 日 02:13

当前可以考虑从其他节点copy监控信息：

/etc/systemd/system/ ： blackbox_exporter xx.service 和 node_exporter-9100.service /home/tidb/deploy/ 你的deploy目录下 copy 所有目录过来. 对比修改下内容。

如果是测试环境，方便点，感觉还是拆了，重新装比较好。
这个问题应该是由于 ansible 当时没有安装监控，我们也考虑下如何避免问题，多谢。

zhengjunbo · 2020 年5 月 15 日 03:17

我这边还涉及不同集群混部一台服务器的情况，导致A集群有对这部机器监控得到，B集群就监控不到

yilong · 2020 年5 月 15 日 04:54

可以尝试将监控部署为不同的端口

system · 2022 年10 月 31 日 19:20

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。