【问题澄清】
- 在 tidb-ansible 时期,pd name 为默认的内部参数,不支持自定义,目前 TiUP 逐渐成为运维主流,也是开放了 pd name 的配置。
- tiup version:v0.6.0
【问题复现】
1
创建扩容文件
global:
user: tidb
ssh_port: 22
deploy_dir: /home/tidb/lqh-clusters/root_test/deploy02
data_dir: /home/tidb/lqh-clusters/root_test/data02
pd_servers:
host: 172.16.5.169
ssh_port: 22
client_port: 52379
peer_port: 52380
server_configs:
pd:
replication.enable-placement-rules: true
2
完整执行过程:tiup cluster scale-out root_test conf/pd-scale-out.yaml
[tidb@node5169 qihang.li]$ tiup cluster scale-out root_test conf/pd-scale-out.yaml
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v0.6.0/cluster scale-out root_test conf/pd-scale-out.yaml
Please confirm your topology:
TiDB Cluster: root_test
TiDB Version: v4.0.0-rc.1
Type Host Ports Directories
---- ---- ----- -----------
pd 172.16.5.169 52379/52380 /home/tidb/lqh-clusters/root_test/deploy02/pd-52379,/home/tidb/lqh-clusters/root_test/data02/pd-52379
Attention:
1. If the topology is not what you expected, check your yaml file.
2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: y
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/cluster/clusters/root_test/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/cluster/clusters/root_test/ssh/id_rsa.pub
- Download node_exporter:v0.17.0 ... Done
+ [ Serial ] - UserSSH: user=tidb, host=172.16.5.169
+ [ Serial ] - Mkdir: host=172.16.5.169, directories='/home/tidb/lqh-clusters/root_test/deploy02/pd-52379','/home/tidb/lqh-clusters/root_test/data02/pd-52379','/home/tidb/lqh-clusters/root_test/deploy02/pd-52379/log','/home/tidb/lqh-clusters/root_test/deploy02/pd-52379/bin','/home/tidb/lqh-clusters/root_test/deploy02/pd-52379/conf','/home/tidb/lqh-clusters/root_test/deploy02/pd-52379/scripts'
+ [ Serial ] - CopyComponent: component=pd, version=v4.0.0-rc.1, remote=172.16.5.169:/home/tidb/lqh-clusters/root_test/deploy02/pd-52379
+ [ Serial ] - ScaleConfig: cluster=root_test, user=tidb, host=172.16.5.169, service=pd-52379.service, deploy_dir=/home/tidb/lqh-clusters/root_test/deploy02/pd-52379, data_dir=/home/tidb/lqh-clusters/root_test/data02/pd-52379, log_dir=/home/tidb/lqh-clusters/root_test/deploy02/pd-52379/log, cache_dir=
script path: /home/tidb/.tiup/storage/cluster/clusters/root_test/config/run_pd_172.16.5.169_52379.sh
script path: /home/tidb/.tiup/components/cluster/v0.6.0/templates/scripts/run_pd_scale.sh.tpl
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.169
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.169
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.169
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.169
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.171
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.169
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.142
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.169
+ [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0}
Starting component pd
Starting instance pd 172.16.5.169:12379
Start pd 172.16.5.169:12379 success
Starting component node_exporter
Starting instance 172.16.5.169
Start 172.16.5.169 success
Starting component blackbox_exporter
Starting instance 172.16.5.169
Start 172.16.5.169 success
Starting component tikv
Starting instance tikv 172.16.5.171:30163
Starting instance tikv 172.16.5.169:30161
Starting instance tikv 172.16.5.169:30162
Start tikv 172.16.5.171:30163 success
Start tikv 172.16.5.169:30162 success
Start tikv 172.16.5.169:30161 success
Starting component node_exporter
Starting instance 172.16.5.171
Start 172.16.5.171 success
Starting component blackbox_exporter
Starting instance 172.16.5.171
Start 172.16.5.171 success
Starting component tidb
Starting instance tidb 172.16.5.169:34000
Start tidb 172.16.5.169:34000 success
Starting component tiflash
Starting instance tiflash 172.16.5.142:29000
Start tiflash 172.16.5.142:29000 success
Starting component node_exporter
Starting instance 172.16.5.142
Start 172.16.5.142 success
Starting component blackbox_exporter
Starting instance 172.16.5.142
Start 172.16.5.142 success
Starting component prometheus
Starting instance prometheus 172.16.5.169:19090
Start prometheus 172.16.5.169:19090 success
Starting component grafana
Starting instance grafana 172.16.5.169:13000
Start grafana 172.16.5.169:13000 success
Checking service state of pd
172.16.5.169 Active: active (running) since 四 2020-05-07 22:38:47 CST; 21h ago
Checking service state of tikv
172.16.5.171 Active: active (running) since 三 2020-05-06 10:36:52 CST; 2 days ago
172.16.5.169 Active: active (running) since 四 2020-05-07 22:36:05 CST; 21h ago
172.16.5.169 Active: active (running) since 四 2020-05-07 22:35:31 CST; 21h ago
Checking service state of tidb
172.16.5.169 Active: active (running) since 三 2020-05-06 10:37:05 CST; 2 days ago
Checking service state of tiflash
172.16.5.142 Active: active (running) since 三 2020-05-06 10:37:15 CST; 2 days ago
Checking service state of prometheus
172.16.5.169 Active: active (running) since 三 2020-05-06 10:37:28 CST; 2 days ago
Checking service state of grafana
172.16.5.169 Active: active (running) since 三 2020-05-06 10:37:40 CST; 2 days ago
+ [Parallel] - UserSSH: user=tidb, host=172.16.5.169
+ [ Serial ] - save meta
+ [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0}
Starting component pd
Starting instance pd 172.16.5.169:52379
pd 172.16.5.169:52379 failed to start: timed out waiting for port 52379 to be started after 1m0s, please check the log of the instance
Error: failed to start: failed to start pd: pd 172.16.5.169:52379 failed to start: timed out waiting for port 52379 to be started after 1m0s, please check the log of the instance: timed out waiting for port 52379 to be started after 1m0s
Verbose debug logs has been written to /home/tidb/qihang.li/logs/tiup-cluster-debug-2020-05-08-19-44-12.log.
Error: run `/home/tidb/.tiup/components/cluster/v0.6.0/cluster` (wd:/home/tidb/.tiup/data/RyOcHwQ) failed: exit status 1
3
pd.log 报错:
[tidb@node5169 qihang.li]$ less /home/tidb/lqh-clusters/root_test/deploy02/pd-52379/log/pd.log
[2020/05/08 19:43:09.692 +08:00] [INFO] [util.go:49] [“Welcome to Placement Driver (PD)”]
[2020/05/08 19:43:09.692 +08:00] [INFO] [util.go:50] [PD] [release-version=v4.0.0-rc.1]
[2020/05/08 19:43:09.692 +08:00] [INFO] [util.go:51] [PD] [git-hash=31dae220db6294f2dc2ec0df330892fe76e59edc]
[2020/05/08 19:43:09.692 +08:00] [INFO] [util.go:52] [PD] [git-branch=heads/refs/tags/v4.0.0-rc.1]
[2020/05/08 19:43:09.692 +08:00] [INFO] [util.go:53] [PD] [utc-build-time=“2020-04-28 11:56:11”]
[2020/05/08 19:43:09.693 +08:00] [INFO] [metricutil.go:81] [“disable Prometheus push client”]
[2020/05/08 19:43:09.693 +08:00] [ERROR] [join.go:213] [“failed to open directory”] [error=“open /home/tidb/lqh-clusters/root_test/data02/pd-52379/member: no such file or directory”]
[2020/05/08 19:43:09.696 +08:00] [FATAL] [main.go:93] [“join meet error”] [error=“missing data or join a duplicated pd”] [stack=“github.com/pingcap/log.Fatal
\t/home/jenkins/agent/workspace/uild_pd_multi_branch_v4.0.0-rc.1/go/pkg/mod/github.com/pingcap/log@v0.0.0-20200117041106-d28c14d3b1cd/global.go:59
main.main
\t/home/jenkins/agent/workspace/uild_pd_multi_branch_v4.0.0-rc.1/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:93
runtime.main
\t/usr/local/go/src/runtime/proc.go:203”]
[2020/05/08 19:43:24.839 +08:00] [INFO] [util.go:49] [“Welcome to Placement Driver (PD)”]
[2020/05/08 19:43:24.839 +08:00] [INFO] [util.go:50] [PD] [release-version=v4.0.0-rc.1]
[2020/05/08 19:43:24.839 +08:00] [INFO] [util.go:51] [PD] [git-hash=31dae220db6294f2dc2ec0df330892fe76e59edc]
[2020/05/08 19:43:24.839 +08:00] [INFO] [util.go:52] [PD] [git-branch=heads/refs/tags/v4.0.0-rc.1]
[2020/05/08 19:43:24.839 +08:00] [INFO] [util.go:53] [PD] [utc-build-time=“2020-04-28 11:56:11”]
[2020/05/08 19:43:24.839 +08:00] [INFO] [metricutil.go:81] [“disable Prometheus push client”]
[2020/05/08 19:43:24.839 +08:00] [ERROR] [join.go:213] [“failed to open directory”] [error=“open /home/tidb/lqh-clusters/root_test/data02/pd-52379/member: no such file or directory”]
2020/05/08 19:43:24.839 grpclog.go:45: [info] parsed scheme: “endpoint”
2020/05/08 19:43:24.840 grpclog.go:45: [info] ccResolverWrapper: sending new addresses to cc: [{http://172.16.5.169:12379 0 }]
[2020/05/08 19:43:24.842 +08:00] [FATAL] [main.go:93] [“join meet error”] [error=“missing data or join a duplicated pd”] [stack="github.com/pingcap/log.Fatal
\t/home/jenkins/agent/workspace/uild_pd_multi_branch_v4.0.0-rc.1/go/p…skipping…
【解决方案】
- tiup v0.6.2 解决此问题:https://github.com/pingcap-incubator/tiup-cluster/issues/383。
- 目前只能通过修改扩容文件中 name,来解决此问题
pd_servers:
- host: 10.0.1.4
# ssh_port: 22
name: "pd-1"
# client_port: 2379
# peer_port: 2380
【经典案例】