扩容TiKV节点报错-冲突提示

版本信息
tidb cluster版本:v4.0.14
tiup 版本: 1.5.6

扩容tikv拓扑文件
[tidb@bigdata-prod-tidb-ansible01 ~]$ cat scale-out-tikv.yaml
tikv_servers:

  • host: 172.30.0.60
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /data1/yxt
    data_dir: /data1/yxt/data
    log_dir: /data1/yxt/log
    config:
    server.labels: { host: “tikv10” }

  • host: 172.30.0.60
    ssh_port: 22
    port: 20161
    status_port: 20181
    deploy_dir: /data2/yxt
    data_dir: /data2/yxt/data
    log_dir: /data2/yxt/log
    config:
    server.labels: { host: “tikv10” }

开始扩容tikv节点
[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Found cluster newer version:

The latest version:         v1.6.0
Local installed version:    v1.5.6
Update current component:   tiup update cluster
Update all components:      tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p

Error: Deploy directory overlaps to another instance (spec.deploy.dir_overlap)

The directory you specified in the topology file is:
** Directory: data directory /data/yxt/cdc-8300**
** Component: cdc 172.29.1.49**

It overlaps to another instance:
** Other Directory: log directory /data/yxt/cdc-8300/log**
** Other Component: cdc 172.29.1.49**

Please modify the topology file and try again.

这边的问题有点奇怪,我要扩容的是172.30.0.60这个新机器,但是却提示的172.29.1.49这个台机器,这个172.29.1.49的cdc是以前扩容过的操作,但是现在我要扩容的是172.30.0.60 tikv, 这2个完全没有关系吧

tiup错乱了?还是说有地方有缓存

先tiup clean --all 试试

[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup clean --all
[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Found cluster newer version:
The latest version: v1.6.0
Local installed version: v1.5.6
Update current component: tiup update cluster
Update all components: tiup update --all

Starting component cluster: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p

Error: Deploy directory overlaps to another instance (spec.deploy.dir_overlap)

The directory you specified in the topology file is:
Directory: data directory /data/yxt/cdc-8300
Component: cdc 172.29.1.49

It overlaps to another instance:
Other Directory: log directory /data/yxt/cdc-8300/log
Other Component: cdc 172.29.1.49

tiup clean --all之后还有有问题

tiup cluster display 看下

[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup cluster display ali-yxt-rpt-center
Found cluster newer version:

    The latest version:         v1.6.0
    Local installed version:    v1.5.6
    Update current component:   tiup update cluster
    Update all components:      tiup update --all

Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster display ali-yxt-rpt-center
Cluster type:       tidb
Cluster name:       ali-yxt-rpt-center
Cluster version:    v4.0.14
Deploy user:        tidb
SSH type:           builtin
Dashboard URL:      http://172.29.1.24:2379/dashboard
ID                  Role          Host           Ports        OS/Arch       Status   Data Dir                 Deploy Dir
--                  ----          ----           -----        -------       ------   --------                 ----------
172.30.16.177:9093  alertmanager  172.30.16.177  9093/9094    linux/x86_64  Up       /data/alertmanager/data  /data/alertmanager
172.29.1.49:8300    cdc           172.29.1.49    8300         linux/x86_64  Up       /data/yxt/cdc-8300       /home/tidb/deploy/cdc-8300
172.30.16.177:8300  cdc           172.30.16.177  8300         linux/x86_64  Up       /data/yxt/cdc-8300       /home/tidb/deploy/cdc-8300
172.30.16.177:3000  grafana       172.30.16.177  3000         linux/x86_64  Up       -                        /data/grafana
172.29.1.23:2379    pd            172.29.1.23    2379/2380    linux/x86_64  Up       /data/yxt/pd-2379        /home/tidb/deploy/pd-2379
172.29.1.24:2379    pd            172.29.1.24    2379/2380    linux/x86_64  Up|L|UI  /data/yxt/pd-2379        /home/tidb/deploy/pd-2379
172.29.1.25:2379    pd            172.29.1.25    2379/2380    linux/x86_64  Up       /data/yxt/pd-2379        /home/tidb/deploy/pd-2379
172.30.16.177:9090  prometheus    172.30.16.177  9090         linux/x86_64  Up       /data/prometheus/data    /data/prometheus
172.31.0.245:4000   tidb          172.31.0.245   4000/10080   linux/x86_64  Up       -                        /home/tidb/deploy
172.31.0.246:4000   tidb          172.31.0.246   4000/10080   linux/x86_64  Up       -                        /home/tidb/deploy
172.29.1.18:20160   tikv          172.29.1.18    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.18:20161   tikv          172.29.1.18    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.19:20160   tikv          172.29.1.19    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.19:20161   tikv          172.29.1.19    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.20:20160   tikv          172.29.1.20    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.20:20161   tikv          172.29.1.20    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.21:20160   tikv          172.29.1.21    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.21:20161   tikv          172.29.1.21    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.22:20160   tikv          172.29.1.22    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.22:20161   tikv          172.29.1.22    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.26:20160   tikv          172.29.1.26    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.26:20161   tikv          172.29.1.26    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.27:20160   tikv          172.29.1.27    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.27:20161   tikv          172.29.1.27    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.29:20160   tikv          172.29.1.29    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.29:20161   tikv          172.29.1.29    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
172.29.1.33:20160   tikv          172.29.1.33    20160/20180  linux/x86_64  Up       /data1/yxt/data          /data1/yxt
172.29.1.33:20161   tikv          172.29.1.33    20161/20181  linux/x86_64  Up       /data2/yxt/data          /data2/yxt
Total nodes: 28

报错的172.29.1.49:8300其实是之前部署过的cdc,新加的172.30.0.30是新服务器

之前CDC扩容又报错没,看下/home/tidb/.tiup/storage/cluster/clusters/集群名/meta.yaml 文件信息

cdc正常扩容,目前这个cdc节点也在正常同步数据中meta.yaml (7.8 KB)

/home/tidb/.tiup/storage/cluster/clusters/集群名/config-cache 把这个cache目录重命名,然后执行试试

# cd /home/tidb/.tiup/storage/cluster/clusters/ali-yxt-rpt-center
# mv config-cache config-cache_bak
# ls
backup  config-cache_bak  meta.yaml  ssh
# tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Found cluster newer version:

    The latest version:         v1.6.0
    Local installed version:    v1.5.6
    Update current component:   tiup update cluster
    Update all components:      tiup update --all

Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p

Error: Deploy directory overlaps to another instance (spec.deploy.dir_overlap)

The directory you specified in the topology file is:
  Directory: data directory /data/yxt/cdc-8300
  Component: cdc 172.29.1.49

It overlaps to another instance:
  Other Directory: log directory /data/yxt/cdc-8300/log
  Other Component: cdc 172.29.1.49

还是一样的问题



参考下上面2个看看

貌似这个问题是 tiup cluster 某个版本里不允许将组件的 log 目录放在 data 目录下增加了校验机制导致的,你可以尝试用旧版本的 tiup cluster 来扩容 tikv 节点,比如:

tiup cluster:v1.3.2 scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p

还真是,使用老版本的tiup cluster就可以验证通过了

# tiup cluster:v1.3.2 scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Type  Host         Ports        OS/Arch       Directories
----  ----         -----        -------       -----------
tikv  172.30.0.31  20160/20180  linux/x86_64  /data1/yxt,/data1/yxt/data
tikv  172.30.0.31  20161/20181  linux/x86_64  /data2/yxt,/data2/yxt/data
Attention:
    1. If the topology is not what you expected, check your yaml file.
    2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]:  N

尝试修改log的目录位置:

tikv_servers:
  - host: 172.30.0.31
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /data1/yxt
    data_dir: /data1/yxt/data
    log_dir: /data1/yxt/log
    config:
      server.labels: { host: "tikv10" }

  - host: 172.30.0.31
    ssh_port: 22
    port: 20161
    status_port: 20181
    deploy_dir: /data2/yxt
    data_dir: /data2/yxt/data
    log_dir: /data2/yxt/log
    config:
      server.labels: { host: "tikv10" }

改为

tikv_servers:
  - host: 172.30.0.31
    ssh_port: 22
    port: 20160
    status_port: 20180
    deploy_dir: /data1/yxt
    data_dir: /data1/yxt/data
    log_dir: /home/tidb/deploy/tikv-20160/log
    config:
      server.labels: { host: "tikv10" }

  - host: 172.30.0.31
    ssh_port: 22
    port: 20161
    status_port: 20181
    deploy_dir: /data2/yxt
    data_dir: /data2/yxt/data
    log_dir: /home/tidb/deploy/tikv-20161/log
    config:
      server.labels: { host: "tikv10" }

我尝试了指定log到home目录,使用新版本的tiup cluster还是会报错的
tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p

新扩的节点调整路径是没用的,因为新版 tiup cluster 在扩容时还是会去校验已部署节点的 data 目录和 log 目录。

好的,非常感谢 @这道题我不会 @h5n1
之前一直没问题,自从升级了tiup cluster到4.0.14后才有问题的,请问这个问题是否在5版本已经修复了

这个跟集群本身版本没有关系,是 tiup cluster 版本导致的问题,这个还没有修复。

好的了解,谢谢你