为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
v5.0.3
【概述】场景+问题概述
线上新部署一套集群,使用 check 及 check --apply 命令都没有问题,网络也都是通的,ssh互信也是没问题的
拓扑结构:
#TiDB Config
global:
user: “tidb”
ssh_port: 22
deploy_dir: “/data/tidb-deploy”
data_dir: “/data/tidb-data”
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
server_configs:
tidb:
performance.txn-total-size-limit: 1073741824
tikv:
readpool.storage.use-unified-pool: false
readpool.coprocessor.use-unified-pool: true
pd:
schedule.leader-schedule-limit: 4
schedule.region-schedule-limit: 2048
schedule.replica-schedule-limit: 64
replication.enable-placement-rules: true
pd_servers:
- host: 10.22.xx.36
- host: 10.22.xx.37
- host: 10.22.xx.38
tidb_servers: - host: 10.22.xx.36
- host: 10.22.xx.37
- host: 10.22.xx.38
tikv_servers: - host: 10.22.xx.30
- host: 10.22.xx.31
- host: 10.22.xx.32
- host: 10.22.xx.33
- host: 10.22.xx.34
- host: 10.22.xx.35
monitoring_servers: - host: 10.22.xx.39
grafana_servers: - host: 10.22.xx.39
alertmanager_servers: - host: 10.22.xx.39
但是在启动集群的时候,出现了下面的报错:
- [ Serial ] - UpdateTopology: cluster=tidb-prod004
{“level”:“warn”,“ts”:“2021-11-01T15:09:04.763+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.0/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc00024ee00/#initially=[10.22.128.36:2379;10.22.xx.37:2379;10.22.xx.38:2379]”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = “transport: Error while dialing dial tcp 10.22.xx.38:2379: connect: no route to host””}
Error: context deadline exceeded
从报错看,好像是pd节点网络通信问题 ,但是检查pd三个节点服务是正常启动的,具体是哪里的问题呢?
- 相关日志 和 监控
-
TiUP Cluster Display 信息
-
TiUP Cluster Edit Config 信息
-
TiDB- Overview 监控
- 对应模块日志(包含问题前后1小时日志)