tidb升级到5.0.1后默认配置路径异常导致tiup命令无法使用

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】测试环境
【概述】tidb升级到5.0.1后默认配置路径异常导致tiup命令无法使用
【背景】通过tiup从4.0.11升级到5.0.1后监控相关的部分配置的默认路径异常,此测试集群是3.0.9由ansible部署,升级到4.0后导入tiup模式
【现象】tiup命令无法使用
【业务影响】测试环境,主要是tiup命令不能使用,无法通过tiup进行相关运维操作了
【TiDB 版本】5.0.1
【附件】

  1. TiUP Cluster Display 信息
    [tidb@test1 ~]$ tiup cluster display test-cluster
    Found cluster newer version:

    The latest version: v1.5.5
    Local installed version: v1.4.1
    Update current component: tiup update cluster
    Update all components: tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.4.1/tiup-cluster display test-cluster
Cluster type: tidb
Cluster name: test-cluster
Cluster version: v5.0.1
SSH type: builtin
Dashboard URL: http://xxx.xxx.xx.x89:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


xxx.xxx.xx.x90:9093 alertmanager xxx.xxx.xx.x90 9093/9094 linux/x86_64 Up /data/tidb/deploy/data.alertmanager /data/tidb/deploy
xxx.xxx.xx.x89:8300 cdc xxx.xxx.xx.x89 8300 linux/x86_64 Up - /data/tidb/deploy/cdc-8300
xxx.xxx.xx.x90:3000 grafana xxx.xxx.xx.x90 3000 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x88:2379 pd xxx.xxx.xx.x88 2379/2380 linux/x86_64 Up /data/tidb/deploy/data.pd /data/tidb/deploy
xxx.xxx.xx.x89:2379 pd xxx.xxx.xx.x89 2379/2380 linux/x86_64 Up|UI /data/tidb/deploy/data.pd /data/tidb/deploy
xxx.xxx.xx.x90:2379 pd xxx.xxx.xx.x90 2379/2380 linux/x86_64 Up|L /data/tidb/deploy/data.pd /data/tidb/deploy
xxx.xxx.xx.x90:9090 prometheus xxx.xxx.xx.x90 9090 linux/x86_64 Up /data/tidb/deploy/prometheus2.0.0.data.metrics /data/tidb/deploy
xxx.xxx.xx.x88:4000 tidb xxx.xxx.xx.x88 4000/10080 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x89:4000 tidb xxx.xxx.xx.x89 4000/10080 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x90:4000 tidb xxx.xxx.xx.x90 4000/10080 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x80:9000 tiflash xxx.xxx.xx.x80 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /data/tidb/data/tiflash-9000 /data/tidb/deploy/tiflash-9000
xxx.xxx.xx.x88:20160 tikv xxx.xxx.xx.x88 20160/20180 linux/x86_64 Up /data/tidb/deploy/data /data/tidb/deploy
xxx.xxx.xx.x89:20160 tikv xxx.xxx.xx.x89 20160/20180 linux/x86_64 Up /data/tidb/deploy/data /data/tidb/deploy
xxx.xxx.xx.x90:20160 tikv xxx.xxx.xx.x90 20160/20180 linux/x86_64 Up /data/tidb/deploy/data /data/tidb/deploy

  1. TiUP Cluster Edit Config 信息
    路径异常部分主要是monitored相关部分的配置路径,由绝对路径变成相对路径,导致monitor-9100未能正常部署
    monitored:
    node_exporter_port: 9100
    blackbox_exporter_port: 9115
    deploy_dir: deploy/monitor-9100
    data_dir: data/monitor-9100
    ** log_dir: deploy/monitor-9100/log**

  2. 执行tiup运维命令时报错
    [tidb@test1 ~]$ tiup cluster reload test-cluster -R tidb
    Found cluster newer version:

    The latest version: v1.5.5
    Local installed version: v1.4.1
    Update current component: tiup update cluster
    Update all components: tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.4.1/tiup-cluster reload test-cluster -R tidb

  • [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x88
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x80
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x88
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x88
  • [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
  • [ Serial ] - UpdateTopology: cluster=test-cluster
  • Refresh instance configs
    • Regenerate config pd -> xxx.xxx.xx.x88:2379 … Done
    • Regenerate config pd -> xxx.xxx.xx.x89:2379 … Done
    • Regenerate config pd -> xxx.xxx.xx.x90:2379 … Done
    • Regenerate config tikv -> xxx.xxx.xx.x89:20160 … Done
    • Regenerate config tikv -> xxx.xxx.xx.x90:20160 … Done
    • Regenerate config tikv -> xxx.xxx.xx.x88:20160 … Done
    • Regenerate config tidb -> xxx.xxx.xx.x88:4000 … Done
    • Regenerate config tidb -> xxx.xxx.xx.x89:4000 … Done
    • Regenerate config tidb -> xxx.xxx.xx.x90:4000 … Done
    • Regenerate config tiflash -> xxx.xxx.xx.x80:9000 … Done
    • Regenerate config cdc -> xxx.xxx.xx.x89:8300 … Done
    • Regenerate config prometheus -> xxx.xxx.xx.x90:9090 … Done
    • Regenerate config grafana -> xxx.xxx.xx.x90:3000 … Done
    • Regenerate config alertmanager -> xxx.xxx.xx.x90:9093 … Done
  • Refresh monitor configs
    • Refresh config node_exporter -> xxx.xxx.xx.x80 … Done
    • Refresh config node_exporter -> xxx.xxx.xx.x88 … Error
    • Refresh config node_exporter -> xxx.xxx.xx.x89 … Error
    • Refresh config node_exporter -> xxx.xxx.xx.x90 … Error
    • Refresh config blackbox_exporter -> xxx.xxx.xx.x90 … Error
    • Refresh config blackbox_exporter -> xxx.xxx.xx.x80 … Done
    • Refresh config blackbox_exporter -> xxx.xxx.xx.x88 … Error
    • Refresh config blackbox_exporter -> xxx.xxx.xx.x89 … Error

Error: failed to scp /home/tidb/.tiup/storage/cluster/clusters/test-cluster/config-cache/run_blackbox_exporter_xxx.xxx.xx.x90.sh to tidb@xxx.xxx.xx.x90:/home/tidb/deploy/monitor-9100/scripts/run_blackbox_exporter.sh: Process exited with status 1

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2021-08-29-22-31-11.log.
Error: run /home/tidb/.tiup/components/cluster/v1.4.1/tiup-cluster (wd:/home/tidb/.tiup/data/ShUG7cg) failed: exit status 1

这些与监控相关的配置原来分散在/data/tidb/deploy目录的bin、conf、scripts文件夹下,请问一下这种情况应如何修复?

  • 对应模块日志(包含问题前后1小时日志)

这块是在升级到 v5.0.1 之后发生了变化吗?另外目前使用的 tiup cluster 版本太低,建议升级到最新版本再试下。

是的,在3.0.9版本的时候,监控的配置目录是和其他组件都安装在指定目录/data/tidb/deploy下,但在4.0以后默认目录变成相对路径deploy/monitor-9100。

我在把tiup升级到最新版本后,发现tiup对这个问题做了部分兼容性修复,在执行tiup cluster reload test-cluster -R tidb命令时会把配置文件自动copy到新目录下,我手工把bin目录下对应的文件copy过去之后恢复正常了。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。