TiUP扩容失败,有个node_exporter-9100.service不存在,


检查该节点确实也不存在,命令执行不了,这是个3.0升级上来4.0.如何能再次部署上去呢

  1. 先找一下其他节点的node_exporter.service ,目录/etc/systemd/system/下面, 看看记录的 类似如下 sh 文件在本机存在吗? 如果存在,从其他节点copy一个service文件,再修改为本机的路径即可。
    ExecStart=/home/tidb/lqh-clusters/root_test/deploy02/monitor-19100/scripts/run_node_exporter.sh
  2. 如果不存在,没有monitor-19100这样的目录,按照扩容文档,扩容一个node exporter

https://pingcap.com/docs-cn/stable/scale-tidb-using-tiup/

不存在任何监控包在上面,应该是升级前这个节点以前也没装过监控。
通过扩展scale-out.yaml

monitored:
  node_exporter_port: 9100
  blackbox_exporter_port: 9115

依然升级不能指定部署上去,刚刚执行了一下还是报这个错

  1. 麻烦发一下您的 tiup cluster display cluster-name 信息 。
  2. 扩容成功了吗? 在/etc/systemd/system下有service文件了吗?扩容有什么报错吗?
  3. 当前这个节点上有什么? pd,tidb,tikv?

扩容失败了,一样是报monitor deploy不过去

  1. 当前48上的tikv时ansible导入的对吧。
  2. 麻烦在48上扩容一个 tidb吧,之后再缩容,就有监控信息了。
  3. 抱歉当前没有单独增加监控的功能,这个我们再和产品沟通下,多谢

是导入的,刚刚也单独扩了下一个TiDB依然报监控弄不上去的错 也尝试了TiKV的扩容,也是卡到这里了。

[tidb@sit2-tidb-mysql-v-l-01:/home/tidb]$tiup cluster scale-out test-cluster scale-out.yaml
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v0.6.0/cluster scale-out test-cluster scale-out.yaml
Please confirm your topology:
TiDB Cluster: test-cluster
TiDB Version: v4.0.0-rc.1
Type  Host          Ports       Directories
----  ----          -----       -----------
tidb  10.204.54.48  4000/10080  /data/TIDB1
Attention:
    1. If the topology is not what you expected, check your yaml file.
    2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]:  y
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa.pub


  - Download blackbox_exporter:v0.12.0 ... Done
+ [ Serial ] - UserSSH: user=tidb, host=10.204.54.48
+ [ Serial ] - Mkdir: host=10.204.54.48, directories='/data/TIDB1','','/data/TIDB1/log','/data/TIDB1/bin','/data/TIDB1/conf','/data/TIDB1/scripts'
+ [ Serial ] - CopyComponent: component=tidb, version=v4.0.0-rc.1, remote=10.204.54.48:/data/TIDB1
+ [ Serial ] - ScaleConfig: cluster=test-cluster, user=tidb, host=10.204.54.48, service=tidb-4000.service, deploy_dir=/data/TIDB1, data_dir=, log_dir=/data/TIDB1/log, cache_dir=
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.85
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.48
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.86
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.78
+ [Parallel] - UserSSH: user=tidb, host=10.204.54.85
+ [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0}
Starting component pd
	Starting instance pd 10.204.54.78:2379
	Starting instance pd 10.204.54.86:2379
	Starting instance pd 10.204.54.85:2379
	Start pd 10.204.54.78:2379 success
	Start pd 10.204.54.85:2379 success
	Start pd 10.204.54.86:2379 success
Starting component node_exporter
	Starting instance 10.204.54.86
	Start 10.204.54.86 success
Starting component blackbox_exporter
	Starting instance 10.204.54.86
	Start 10.204.54.86 success
Starting component node_exporter
	Starting instance 10.204.54.85
	Start 10.204.54.85 success
Starting component blackbox_exporter
	Starting instance 10.204.54.85
	Start 10.204.54.85 success
Starting component node_exporter
	Starting instance 10.204.54.78
	Start 10.204.54.78 success
Starting component blackbox_exporter
	Starting instance 10.204.54.78
	Start 10.204.54.78 success
Starting component tikv
	Starting instance tikv 10.204.54.86:20160
	Starting instance tikv 10.204.54.48:20160
	Starting instance tikv 10.204.54.85:20160
	Start tikv 10.204.54.85:20160 success
	Start tikv 10.204.54.48:20160 success
	Start tikv 10.204.54.86:20160 success
Starting component node_exporter
	Starting instance 10.204.54.48
Failed to start node_exporter-9100.service: Unit not found.


Error: failed to start: failed to start: 10.204.54.48: executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.204.54.48:22' {ssh_stderr: Failed to start node_exporter-9100.service: Unit not found.
, ssh_stdout: , ssh_command: PATH=$PATH:/usr/bin:/usr/sbin sudo -H -u root bash -c "systemctl daemon-reload && systemctl start node_exporter-9100.service"}, cause: Process exited with status 5

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-05-14-16-59-36.log.
Error: run `/home/tidb/.tiup/components/cluster/v0.6.0/cluster` (wd:/home/tidb/.tiup/data/Ryx2fWa) failed: exit status 1

稍等,我们再研究下。

好的,没事,这是测试环境。晚点我重建下就好

着急吗?要是可以的话,先别重建,我们要是解决不了,再辛苦重建。多谢

好的,等你们回复

请问下,您其他节点有这个监控信息吗?

有的,这个单独没部署的节点应该是ansible的时候没放进monitor里面导致的。

  1. 当前可以考虑从其他节点copy监控信息:

/etc/systemd/system/ : blackbox_exporter xx.service 和 node_exporter-9100.service /home/tidb/deploy/ 你的deploy目录下 copy 所有目录过来. 对比修改下内容。

  1. 如果是测试环境,方便点,感觉还是拆了,重新装比较好。

  2. 这个问题应该是由于 ansible 当时没有安装监控,我们也考虑下如何避免问题,多谢。

我这边还涉及不同集群混部一台服务器的情况,导致A集群有对这部机器监控得到,B集群就监控不到

可以尝试将监控部署为不同的端口

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。