为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】测试环境
【概述】tidb升级到5.0.1后默认配置路径异常导致tiup命令无法使用
【背景】通过tiup从4.0.11升级到5.0.1后监控相关的部分配置的默认路径异常,此测试集群是3.0.9由ansible部署,升级到4.0后导入tiup模式
【现象】tiup命令无法使用
【业务影响】测试环境,主要是tiup命令不能使用,无法通过tiup进行相关运维操作了
【TiDB 版本】5.0.1
【附件】
- 相关日志 和 监控
-
TiUP Cluster Display 信息
[tidb@test1 ~]$ tiup cluster display test-cluster
Found cluster newer version:The latest version: v1.5.5
Local installed version: v1.4.1
Update current component: tiup update cluster
Update all components: tiup update --all
Starting component cluster
: /home/tidb/.tiup/components/cluster/v1.4.1/tiup-cluster display test-cluster
Cluster type: tidb
Cluster name: test-cluster
Cluster version: v5.0.1
SSH type: builtin
Dashboard URL: http://xxx.xxx.xx.x89:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
xxx.xxx.xx.x90:9093 alertmanager xxx.xxx.xx.x90 9093/9094 linux/x86_64 Up /data/tidb/deploy/data.alertmanager /data/tidb/deploy
xxx.xxx.xx.x89:8300 cdc xxx.xxx.xx.x89 8300 linux/x86_64 Up - /data/tidb/deploy/cdc-8300
xxx.xxx.xx.x90:3000 grafana xxx.xxx.xx.x90 3000 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x88:2379 pd xxx.xxx.xx.x88 2379/2380 linux/x86_64 Up /data/tidb/deploy/data.pd /data/tidb/deploy
xxx.xxx.xx.x89:2379 pd xxx.xxx.xx.x89 2379/2380 linux/x86_64 Up|UI /data/tidb/deploy/data.pd /data/tidb/deploy
xxx.xxx.xx.x90:2379 pd xxx.xxx.xx.x90 2379/2380 linux/x86_64 Up|L /data/tidb/deploy/data.pd /data/tidb/deploy
xxx.xxx.xx.x90:9090 prometheus xxx.xxx.xx.x90 9090 linux/x86_64 Up /data/tidb/deploy/prometheus2.0.0.data.metrics /data/tidb/deploy
xxx.xxx.xx.x88:4000 tidb xxx.xxx.xx.x88 4000/10080 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x89:4000 tidb xxx.xxx.xx.x89 4000/10080 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x90:4000 tidb xxx.xxx.xx.x90 4000/10080 linux/x86_64 Up - /data/tidb/deploy
xxx.xxx.xx.x80:9000 tiflash xxx.xxx.xx.x80 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /data/tidb/data/tiflash-9000 /data/tidb/deploy/tiflash-9000
xxx.xxx.xx.x88:20160 tikv xxx.xxx.xx.x88 20160/20180 linux/x86_64 Up /data/tidb/deploy/data /data/tidb/deploy
xxx.xxx.xx.x89:20160 tikv xxx.xxx.xx.x89 20160/20180 linux/x86_64 Up /data/tidb/deploy/data /data/tidb/deploy
xxx.xxx.xx.x90:20160 tikv xxx.xxx.xx.x90 20160/20180 linux/x86_64 Up /data/tidb/deploy/data /data/tidb/deploy
-
TiUP Cluster Edit Config 信息
路径异常部分主要是monitored相关部分的配置路径,由绝对路径变成相对路径,导致monitor-9100未能正常部署
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
deploy_dir: deploy/monitor-9100
data_dir: data/monitor-9100
** log_dir: deploy/monitor-9100/log** -
执行tiup运维命令时报错
[tidb@test1 ~]$ tiup cluster reload test-cluster -R tidb
Found cluster newer version:The latest version: v1.5.5
Local installed version: v1.4.1
Update current component: tiup update cluster
Update all components: tiup update --all
Starting component cluster
: /home/tidb/.tiup/components/cluster/v1.4.1/tiup-cluster reload test-cluster -R tidb
- [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/test-cluster/ssh/id_rsa.pub
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x88
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x80
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x88
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x89
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x88
- [Parallel] - UserSSH: user=tidb, host=xxx.xxx.xx.x90
- [ Serial ] - UpdateTopology: cluster=test-cluster
- Refresh instance configs
- Regenerate config pd -> xxx.xxx.xx.x88:2379 … Done
- Regenerate config pd -> xxx.xxx.xx.x89:2379 … Done
- Regenerate config pd -> xxx.xxx.xx.x90:2379 … Done
- Regenerate config tikv -> xxx.xxx.xx.x89:20160 … Done
- Regenerate config tikv -> xxx.xxx.xx.x90:20160 … Done
- Regenerate config tikv -> xxx.xxx.xx.x88:20160 … Done
- Regenerate config tidb -> xxx.xxx.xx.x88:4000 … Done
- Regenerate config tidb -> xxx.xxx.xx.x89:4000 … Done
- Regenerate config tidb -> xxx.xxx.xx.x90:4000 … Done
- Regenerate config tiflash -> xxx.xxx.xx.x80:9000 … Done
- Regenerate config cdc -> xxx.xxx.xx.x89:8300 … Done
- Regenerate config prometheus -> xxx.xxx.xx.x90:9090 … Done
- Regenerate config grafana -> xxx.xxx.xx.x90:3000 … Done
- Regenerate config alertmanager -> xxx.xxx.xx.x90:9093 … Done
- Refresh monitor configs
- Refresh config node_exporter -> xxx.xxx.xx.x80 … Done
- Refresh config node_exporter -> xxx.xxx.xx.x88 … Error
- Refresh config node_exporter -> xxx.xxx.xx.x89 … Error
- Refresh config node_exporter -> xxx.xxx.xx.x90 … Error
- Refresh config blackbox_exporter -> xxx.xxx.xx.x90 … Error
- Refresh config blackbox_exporter -> xxx.xxx.xx.x80 … Done
- Refresh config blackbox_exporter -> xxx.xxx.xx.x88 … Error
- Refresh config blackbox_exporter -> xxx.xxx.xx.x89 … Error
Error: failed to scp /home/tidb/.tiup/storage/cluster/clusters/test-cluster/config-cache/run_blackbox_exporter_xxx.xxx.xx.x90.sh to tidb@xxx.xxx.xx.x90:/home/tidb/deploy/monitor-9100/scripts/run_blackbox_exporter.sh: Process exited with status 1
Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2021-08-29-22-31-11.log.
Error: run /home/tidb/.tiup/components/cluster/v1.4.1/tiup-cluster
(wd:/home/tidb/.tiup/data/ShUG7cg) failed: exit status 1
这些与监控相关的配置原来分散在/data/tidb/deploy目录的bin、conf、scripts文件夹下,请问一下这种情况应如何修复?
- 对应模块日志(包含问题前后1小时日志)