背景
版本信息
TiDB Version: v4.0.4
TiUP Version: v1.0.8
环境信息
-
TiDB 节点
10.129.128.125
10.129.128.126 -
PD 节点
10.129.128.127
10.129.128.128
10.129.128.129 -
TiKV 节点
10.129.128.130
10.129.128.131
10.129.128.132
扩缩容操作
1. 增加 2 个新的 PD
- 配置 scale-out-pd.yaml
pd_servers:
- host: 10.129.128.125
ssh_port: 22
name: pd-10.129.128.125-2379
client_port: 2379
peer_port: 2380
deploy_dir: /data/tidb/tidb-deploy/pd-2379
data_dir: /data/tidb/tidb-data/pd-2379
log_dir: /data/tidb/tidb-log/pd-2379
- host: 10.129.128.126
ssh_port: 22
name: pd-10.129.128.125-2379
client_port: 2379
peer_port: 2380
deploy_dir: /data/tidb/tidb-deploy/pd-2379
data_dir: /data/tidb/tidb-data/pd-2379
log_dir: /data/tidb/tidb-log/pd-2379
- 执行 PD 扩容操作
tiup cluster scale-out test-tidb-prod scale-out-pd.yaml --user root -i /home/tidb/.ssh/id_rsa
2. 缩容 2 个老的 PD 节点
tiup cluster scale-in test-tidb-prod --node 10.129.128.128:2379
tiup cluster scale-in test-tidb-prod --node 10.129.128.129:2379
3. 扩容 TiKV 节点
- 配置 scale-out-tikv.yaml
tikv_servers:
- host: 10.129.128.128
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /data/tidb/tidb-deploy/tikv-20160
data_dir: /data/tidb/tidb-data/tikv-20160
log_dir: /data/tidb/tidb-log/tikv-20160
- host: 10.129.128.129
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /data/tidb/tidb-deploy/tikv-20160
data_dir: /data/tidb/tidb-data/tikv-20160
log_dir: /data/tidb/tidb-log/tikv-20160
- 执行 TiKV 扩容操作
tiup cluster scale-out test-tidb-prod scale-out-tikv.yaml --user root -i /home/tidb/.ssh/id_rsa
4. 扩缩容后集群信息
-
TiDB 节点
10.129.128.125
10.129.128.126 -
PD 节点
10.129.128.125
10.129.128.126
10.129.128.127 -
TiKV 节点
10.129.128.128
10.129.128.129
10.129.128.130
10.129.128.131
10.129.128.132
访问 TiDB Dashboard
经过上述扩缩容后,TiDB 集群服务正常,正常对外提供服务。
在原有正常的 TiDB 集群上进行扩容和缩容操作后,导致 TiDB Dashboard 集群信息中的主机页面访问失败,错误日志为:
{"error":true,"message":"error.api.other: Error 1105: Get http://10.129.128.128:2379/pd/api/v1/config/cluster-version: dial tcp 10.129.128.128:2379: connect: connection refused","code":"error.api.other","full_text":"error.api.other: Error 1105: Get http://10.129.128.128:2379/pd/api/v1/config/cluster-version: dial tcp 10.129.128.128:2379: connect: connection refused\
at github.com/pingcap-incubator/tidb-dashboard/pkg/apiserver/utils.MWHandleErrors.func1()\
...
根据错误记录的还是扩缩容之前的 PD 节点(该节点已经缩容下线了),目前 PD leader 为 10.129.128.127 节点,那么针对这种情况如何修改?