zhangji
(张吉吉)
2021 年10 月 18 日 13:49
1
版本信息
tidb cluster版本:v4.0.14
tiup 版本: 1.5.6
扩容tikv拓扑文件
[tidb@bigdata-prod-tidb-ansible01 ~]$ cat scale-out-tikv.yaml
tikv_servers:
host: 172.30.0.60
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /data1/yxt
data_dir: /data1/yxt/data
log_dir: /data1/yxt/log
config:
server.labels: { host: “tikv10” }
host: 172.30.0.60
ssh_port: 22
port: 20161
status_port: 20181
deploy_dir: /data2/yxt
data_dir: /data2/yxt/data
log_dir: /data2/yxt/log
config:
server.labels: { host: “tikv10” }
开始扩容tikv节点
[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Found cluster newer version:
The latest version: v1.6.0
Local installed version: v1.5.6
Update current component: tiup update cluster
Update all components: tiup update --all
Starting component cluster
: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Error: Deploy directory overlaps to another instance (spec.deploy.dir_overlap)
The directory you specified in the topology file is:
** Directory: data directory /data/yxt/cdc-8300**
** Component: cdc 172.29.1.49**
It overlaps to another instance:
** Other Directory: log directory /data/yxt/cdc-8300/log**
** Other Component: cdc 172.29.1.49**
Please modify the topology file and try again.
这边的问题有点奇怪,我要扩容的是172.30.0.60这个新机器,但是却提示的172.29.1.49这个台机器,这个172.29.1.49的cdc是以前扩容过的操作,但是现在我要扩容的是172.30.0.60 tikv, 这2个完全没有关系吧
tiup错乱了?还是说有地方有缓存
zhangji
(张吉吉)
2021 年10 月 19 日 01:33
4
[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup clean --all
[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Found cluster newer version:
The latest version: v1.6.0
Local installed version: v1.5.6
Update current component: tiup update cluster
Update all components: tiup update --all
Starting component cluster
: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Error: Deploy directory overlaps to another instance (spec.deploy.dir_overlap)
The directory you specified in the topology file is:
Directory: data directory /data/yxt/cdc-8300
Component: cdc 172.29.1.49
It overlaps to another instance:
Other Directory: log directory /data/yxt/cdc-8300/log
Other Component: cdc 172.29.1.49
tiup clean --all之后还有有问题
zhangji
(张吉吉)
2021 年10 月 19 日 02:34
6
[tidb@bigdata-prod-tidb-ansible01 ~]$ tiup cluster display ali-yxt-rpt-center
Found cluster newer version:
The latest version: v1.6.0
Local installed version: v1.5.6
Update current component: tiup update cluster
Update all components: tiup update --all
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster display ali-yxt-rpt-center
Cluster type: tidb
Cluster name: ali-yxt-rpt-center
Cluster version: v4.0.14
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://172.29.1.24:2379/dashboard
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
172.30.16.177:9093 alertmanager 172.30.16.177 9093/9094 linux/x86_64 Up /data/alertmanager/data /data/alertmanager
172.29.1.49:8300 cdc 172.29.1.49 8300 linux/x86_64 Up /data/yxt/cdc-8300 /home/tidb/deploy/cdc-8300
172.30.16.177:8300 cdc 172.30.16.177 8300 linux/x86_64 Up /data/yxt/cdc-8300 /home/tidb/deploy/cdc-8300
172.30.16.177:3000 grafana 172.30.16.177 3000 linux/x86_64 Up - /data/grafana
172.29.1.23:2379 pd 172.29.1.23 2379/2380 linux/x86_64 Up /data/yxt/pd-2379 /home/tidb/deploy/pd-2379
172.29.1.24:2379 pd 172.29.1.24 2379/2380 linux/x86_64 Up|L|UI /data/yxt/pd-2379 /home/tidb/deploy/pd-2379
172.29.1.25:2379 pd 172.29.1.25 2379/2380 linux/x86_64 Up /data/yxt/pd-2379 /home/tidb/deploy/pd-2379
172.30.16.177:9090 prometheus 172.30.16.177 9090 linux/x86_64 Up /data/prometheus/data /data/prometheus
172.31.0.245:4000 tidb 172.31.0.245 4000/10080 linux/x86_64 Up - /home/tidb/deploy
172.31.0.246:4000 tidb 172.31.0.246 4000/10080 linux/x86_64 Up - /home/tidb/deploy
172.29.1.18:20160 tikv 172.29.1.18 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.18:20161 tikv 172.29.1.18 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.19:20160 tikv 172.29.1.19 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.19:20161 tikv 172.29.1.19 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.20:20160 tikv 172.29.1.20 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.20:20161 tikv 172.29.1.20 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.21:20160 tikv 172.29.1.21 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.21:20161 tikv 172.29.1.21 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.22:20160 tikv 172.29.1.22 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.22:20161 tikv 172.29.1.22 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.26:20160 tikv 172.29.1.26 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.26:20161 tikv 172.29.1.26 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.27:20160 tikv 172.29.1.27 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.27:20161 tikv 172.29.1.27 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.29:20160 tikv 172.29.1.29 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.29:20161 tikv 172.29.1.29 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
172.29.1.33:20160 tikv 172.29.1.33 20160/20180 linux/x86_64 Up /data1/yxt/data /data1/yxt
172.29.1.33:20161 tikv 172.29.1.33 20161/20181 linux/x86_64 Up /data2/yxt/data /data2/yxt
Total nodes: 28
报错的172.29.1.49:8300其实是之前部署过的cdc,新加的172.30.0.30是新服务器
h5n1
(H5n1)
2021 年10 月 19 日 03:17
7
之前CDC扩容又报错没,看下/home/tidb/.tiup/storage/cluster/clusters/集群名/meta.yaml 文件信息
zhangji
(张吉吉)
2021 年10 月 19 日 05:35
8
cdc正常扩容,目前这个cdc节点也在正常同步数据中meta.yaml (7.8 KB)
h5n1
(H5n1)
2021 年10 月 19 日 06:21
9
/home/tidb/.tiup/storage/cluster/clusters/集群名/config-cache 把这个cache目录重命名,然后执行试试
zhangji
(张吉吉)
2021 年10 月 19 日 07:45
10
# cd /home/tidb/.tiup/storage/cluster/clusters/ali-yxt-rpt-center
# mv config-cache config-cache_bak
# ls
backup config-cache_bak meta.yaml ssh
# tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Found cluster newer version:
The latest version: v1.6.0
Local installed version: v1.5.6
Update current component: tiup update cluster
Update all components: tiup update --all
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.5.6/tiup-cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Error: Deploy directory overlaps to another instance (spec.deploy.dir_overlap)
The directory you specified in the topology file is:
Directory: data directory /data/yxt/cdc-8300
Component: cdc 172.29.1.49
It overlaps to another instance:
Other Directory: log directory /data/yxt/cdc-8300/log
Other Component: cdc 172.29.1.49
还是一样的问题
这道题我不会
(Lizhengyang@PingCAP)
2021 年10 月 19 日 09:21
13
貌似这个问题是 tiup cluster 某个版本里不允许将组件的 log 目录放在 data 目录下增加了校验机制导致的,你可以尝试用旧版本的 tiup cluster 来扩容 tikv 节点,比如:
tiup cluster:v1.3.2 scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
zhangji
(张吉吉)
2021 年10 月 19 日 09:39
14
还真是,使用老版本的tiup cluster就可以验证通过了
# tiup cluster:v1.3.2 scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
Type Host Ports OS/Arch Directories
---- ---- ----- ------- -----------
tikv 172.30.0.31 20160/20180 linux/x86_64 /data1/yxt,/data1/yxt/data
tikv 172.30.0.31 20161/20181 linux/x86_64 /data2/yxt,/data2/yxt/data
Attention:
1. If the topology is not what you expected, check your yaml file.
2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: N
尝试修改log的目录位置:
tikv_servers:
- host: 172.30.0.31
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /data1/yxt
data_dir: /data1/yxt/data
log_dir: /data1/yxt/log
config:
server.labels: { host: "tikv10" }
- host: 172.30.0.31
ssh_port: 22
port: 20161
status_port: 20181
deploy_dir: /data2/yxt
data_dir: /data2/yxt/data
log_dir: /data2/yxt/log
config:
server.labels: { host: "tikv10" }
改为
tikv_servers:
- host: 172.30.0.31
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /data1/yxt
data_dir: /data1/yxt/data
log_dir: /home/tidb/deploy/tikv-20160/log
config:
server.labels: { host: "tikv10" }
- host: 172.30.0.31
ssh_port: 22
port: 20161
status_port: 20181
deploy_dir: /data2/yxt
data_dir: /data2/yxt/data
log_dir: /home/tidb/deploy/tikv-20161/log
config:
server.labels: { host: "tikv10" }
我尝试了指定log到home目录,使用新版本的tiup cluster还是会报错的
tiup cluster scale-out ali-yxt-rpt-center scale-out-tikv.yaml -uroot -p
这道题我不会
(Lizhengyang@PingCAP)
2021 年10 月 19 日 09:46
15
新扩的节点调整路径是没用的,因为新版 tiup cluster 在扩容时还是会去校验已部署节点的 data 目录和 log 目录。
zhangji
(张吉吉)
2021 年10 月 19 日 09:52
16
好的,非常感谢 @这道题我不会 @h5n1
之前一直没问题,自从升级了tiup cluster到4.0.14后才有问题的,请问这个问题是否在5版本已经修复了
这道题我不会
(Lizhengyang@PingCAP)
2021 年10 月 19 日 10:03
17
这个跟集群本身版本没有关系,是 tiup cluster 版本导致的问题,这个还没有修复。
system
(system)
关闭
2022 年10 月 31 日 19:09
19
此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。