tikv启动不了 (防火墙确认都已经关闭,三个tikv已经分配给了三个虚拟机)

配置文件如下

# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
 user: "tidb"
 ssh_port: 22
 deploy_dir: "/tidb-deploy"
 data_dir: "/tidb-data"

# # Monitored variables are applied to all the machines.
monitored:
 node_exporter_port: 9100
 blackbox_exporter_port: 9115

server_configs:
 tidb:
   instance.tidb_slow_log_threshold: 300
 tikv:
   readpool.storage.use-unified-pool: false
   readpool.coprocessor.use-unified-pool: true
 pd:
   replication.enable-placement-rules: true
   replication.location-labels: ["host"]
 tiflash:
   logger.level: "info"

pd_servers:
 - host: 175.27.241.31

tidb_servers:
 - host: 175.27.169.129

tikv_servers:
 - host: 175.27.241.31
   port: 20160
   status_port: 20180
   config:
     server.labels: { host: "logic-host-1" }

 - host: 175.27.169.129
port: 20161
   status_port: 20181
   config:
     server.labels: { host: "logic-host-2" }

 - host: 119.45.142.75
   port: 20162
   status_port: 20182
   config:
     server.labels: { host: "logic-host-3" }

tiflash_servers:
 - host: 119.45.142.75

monitoring_servers:
 - host: 175.27.241.31

grafana_servers:
 - host: 175.27.169.129

部署过程如下

 ~ tiup cluster deploy TiDB-cluster v7.2.0 ./topology.yaml --user root -p
tiup is checking updates for component cluster ...
Starting component `cluster`: /root/.tiup/components/cluster/v1.12.5/tiup-cluster deploy TiDB-cluster v7.2.0 ./topology.yaml --user root -p
Input SSH password:



+ Detect CPU Arch Name
  - Detecting node 175.27.241.31 Arch info ... Done
  - Detecting node 175.27.169.129 Arch info ... Done
  - Detecting node 119.45.142.75 Arch info ... Done



+ Detect CPU OS Name
  - Detecting node 175.27.241.31 OS info ... Done
  - Detecting node 175.27.169.129 OS info ... Done
  - Detecting node 119.45.142.75 OS info ... Done
Please confirm your topology:
Cluster type:    tidb
Cluster name:    TiDB-cluster
Cluster version: v7.2.0
Role        Host            Ports                            OS/Arch       Directories
----        ----            -----                            -------       -----------
pd          175.27.241.31   2379/2380                        linux/x86_64  /tidb-deploy/pd-2379,/tidb-data/pd-2379
tikv        175.27.241.31   20160/20180                      linux/x86_64  /tidb-deploy/tikv-20160,/tidb-data/tikv-20160
tikv        175.27.169.129  20161/20181                      linux/x86_64  /tidb-deploy/tikv-20161,/tidb-data/tikv-20161
tikv        119.45.142.75   20162/20182                      linux/x86_64  /tidb-deploy/tikv-20162,/tidb-data/tikv-20162
tidb        175.27.169.129  4000/10080                       linux/x86_64  /tidb-deploy/tidb-4000
tiflash     119.45.142.75   9000/8123/3930/20170/20292/8234  linux/x86_64  /tidb-deploy/tiflash-9000,/tidb-data/tiflash-9000
prometheus  175.27.241.31   9090/12020                       linux/x86_64  /tidb-deploy/prometheus-9090,/tidb-data/prometheus-9090
grafana     175.27.169.129  3000                             linux/x86_64  /tidb-deploy/grafana-3000
Attention:
    1. If the topology is not what you expected, check your yaml file.
    2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: (default=N) y
+ Generate SSH keys ... Done
+ Download TiDB components
  - Download pd:v7.2.0 (linux/amd64) ... Done
  - Download tikv:v7.2.0 (linux/amd64) ... Done
  - Download tidb:v7.2.0 (linux/amd64) ... Done
  - Download tiflash:v7.2.0 (linux/amd64) ... Done
  - Download prometheus:v7.2.0 (linux/amd64) ... Done
  - Download grafana:v7.2.0 (linux/amd64) ... Done
  - Download node_exporter: (linux/amd64) ... Done
  - Download blackbox_exporter: (linux/amd64) ... Done
+ Initialize target host environments
  - Prepare 175.27.241.31:22 ... Done
  - Prepare 175.27.169.129:22 ... Done
  - Prepare 119.45.142.75:22 ... Done
+ Deploy TiDB instance
  - Copy pd -> 175.27.241.31 ... Done
  - Copy tikv -> 175.27.241.31 ... Done
  - Copy tikv -> 175.27.169.129 ... Done
  - Copy tikv -> 119.45.142.75 ... Done
  - Copy tidb -> 175.27.169.129 ... Done
  - Copy tiflash -> 119.45.142.75 ... Done
  - Copy prometheus -> 175.27.241.31 ... Done
  - Copy grafana -> 175.27.169.129 ... Done
  - Deploy node_exporter -> 175.27.169.129 ... Done
  - Deploy node_exporter -> 119.45.142.75 ... Done
  - Deploy node_exporter -> 175.27.241.31 ... Done
  - Deploy blackbox_exporter -> 175.27.169.129 ... Done
  - Deploy blackbox_exporter -> 119.45.142.75 ... Done
  - Deploy blackbox_exporter -> 175.27.241.31 ... Done
+ Copy certificate to remote host
+ Init instance configs
  - Generate config pd -> 175.27.241.31:2379 ... Done
  - Generate config tikv -> 175.27.241.31:20160 ... Done
  - Generate config tikv -> 175.27.169.129:20161 ... Done
  - Generate config tikv -> 119.45.142.75:20162 ... Done
  - Generate config tidb -> 175.27.169.129:4000 ... Done
  - Generate config tiflash -> 119.45.142.75:9000 ... Done
  - Generate config prometheus -> 175.27.241.31:9090 ... Done
  - Generate config grafana -> 175.27.169.129:3000 ... Done
+ Init monitor configs
  - Generate config node_exporter -> 175.27.241.31 ... Done
  - Generate config node_exporter -> 175.27.169.129 ... Done
  - Generate config node_exporter -> 119.45.142.75 ... Done
  - Generate config blackbox_exporter -> 175.27.241.31 ... Done
  - Generate config blackbox_exporter -> 175.27.169.129 ... Done
  - Generate config blackbox_exporter -> 119.45.142.75 ... Done
Enabling component pd
        Enabling instance 175.27.241.31:2379
        Enable instance 175.27.241.31:2379 success
Enabling component tikv
        Enabling instance 119.45.142.75:20162
        Enabling instance 175.27.241.31:20160
        Enabling instance 175.27.169.129:20161
        Enable instance 175.27.169.129:20161 success
        Enable instance 175.27.241.31:20160 success
        Enable instance 119.45.142.75:20162 success
Enabling component tidb
        Enabling instance 175.27.169.129:4000
        Enable instance 175.27.169.129:4000 success
Enabling component tiflash
        Enabling instance 119.45.142.75:9000
        Enable instance 119.45.142.75:9000 success
Enabling component prometheus
        Enabling instance 175.27.241.31:9090
        Enable instance 175.27.241.31:9090 success
Enabling component grafana
        Enabling instance 175.27.169.129:3000
        Enable instance 175.27.169.129:3000 success
Enabling component node_exporter
        Enabling instance 119.45.142.75
        Enabling instance 175.27.169.129
        Enabling instance 175.27.241.31
        Enable 175.27.169.129 success
        Enable 119.45.142.75 success
        Enable 175.27.241.31 success
Enabling component blackbox_exporter
        Enabling instance 119.45.142.75
        Enabling instance 175.27.241.31
        Enabling instance 175.27.169.129
        Enable 175.27.169.129 success
        Enable 119.45.142.75 success
        Enable 175.27.241.31 success
Cluster `TiDB-cluster` deployed successfully, you can start it with command: `tiup cluster start TiDB-cluster --init`

启动过程如下:

 tiup cluster start TiDB-cluster
tiup is checking updates for component cluster ...
Starting component `cluster`: /root/.tiup/components/cluster/v1.12.5/tiup-cluster start TiDB-cluster
Starting cluster TiDB-cluster...
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/TiDB-cluster/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/TiDB-cluster/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=175.27.169.129
+ [Parallel] - UserSSH: user=tidb, host=119.45.142.75
+ [Parallel] - UserSSH: user=tidb, host=175.27.241.31
+ [Parallel] - UserSSH: user=tidb, host=175.27.169.129
+ [Parallel] - UserSSH: user=tidb, host=175.27.241.31
+ [Parallel] - UserSSH: user=tidb, host=175.27.241.31
+ [Parallel] - UserSSH: user=tidb, host=175.27.169.129
+ [Parallel] - UserSSH: user=tidb, host=119.45.142.75
+ [ Serial ] - StartCluster
Starting component pd
        Starting instance 175.27.241.31:2379
        Start instance 175.27.241.31:2379 success
Starting component tikv
        Starting instance 119.45.142.75:20162
        Starting instance 175.27.241.31:20160
        Starting instance 175.27.169.129:20161

Error: failed to start tikv: failed to start: 175.27.169.129 tikv-20161.service, please check the instance's log(/tidb-deploy/tikv-20161/log) for more detail.: timed out waiting for port 20161 to be started after 2m0s

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2023-07-22-14-07-40.log.

/tidb-deploy/tikv-20161/log文件如下:

[2023/07/22 13:52:43.856 +08:00] [INFO] [lib.rs:88] ["Welcome to TiKV"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["Release Version:   7.2.0"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["Edition:           Community"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["Git Commit Hash:   12ce5540f9e8f781f14d3b3a58fb9442f03b6b29"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["Git Commit Branch: heads/refs/tags/v7.2.0"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["UTC Build Time:    Unknown (env var does not exist when building)"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["Rust Version:      rustc 1.67.0-nightly (96ddd32c4 2022-11-14)"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["Enable Features:   pprof-fp jemalloc mem-profiling portable sse test-engine-kv-rocksdb test-engine-raft-raft-engine cloud-aws cloud-gcp cloud-azure"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [lib.rs:93] ["Profile:           dist_release"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [mod.rs:80] ["cgroup quota: memory=Some(9223372036854771712), cpu=None, cores={0, 1}"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [mod.rs:87] ["memory limit in bytes: 16249131008, cpu cores quota: 2"]
[2023/07/22 13:52:43.857 +08:00] [WARN] [lib.rs:544] ["environment variable `TZ` is missing, using `/etc/localtime`"]
[2023/07/22 13:52:43.857 +08:00] [INFO] [config.rs:723] ["kernel parameters"] [value=32768] [param=net.core.somaxconn]
[2023/07/22 13:52:43.857 +08:00] [INFO] [config.rs:723] ["kernel parameters"] [value=0] [param=net.ipv4.tcp_syncookies]
[2023/07/22 13:52:43.857 +08:00] [INFO] [config.rs:723] ["kernel parameters"] [value=0] [param=vm.swappiness]
[2023/07/22 13:52:43.867 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:43.868 +08:00] [INFO] [<unknown>] ["TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter"]
[2023/07/22 13:52:45.868 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:45.868 +08:00] [WARN] [client.rs:168] ["validate PD endpoints failed"] [err="Other(\"[components/pd_client/src/util.rs:599]: PD cluster failed to respond\")"]
[2023/07/22 13:52:46.170 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:48.171 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:48.472 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:50.473 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:50.774 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:52.775 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:53.076 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:55.077 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:55.378 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:57.379 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:57.680 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:59.681 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:59.982 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:01.983 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:02.284 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:04.285 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:04.586 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:06.587 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:06.888 +08:00] [INFO] [util.rs:604] ["connecting to PD endpoint"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:08.889 +08:00] [INFO] [util.rs:566] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=175.27.241.31:2379]
[2023/07/22 13:53:08.889 +08:00] [WARN] [client.rs:168] ["validate PD endpoints failed"] [err="Other(\"[components/pd_client/src/util.rs:599]: PD cluster failed to respond\")"]

日志只给了个PD 节点无法响应或超时,啥环境配置的。

[2023/07/22 13:52:43.868 +08:00] [INFO] [] [“TCP_USER_TIMEOUT is available. TCP_USER_TIMEOUT will be used thereafter”]
[2023/07/22 13:52:45.868 +08:00] [INFO] [util.rs:566] [“PD failed to respond”] [err=“Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: "Deadline Exceeded", details: [] }))”] [endpoints=175.27.241.31:2379]
[2023/07/22 13:52:45.868 +08:00] [WARN] [client.rs:168] [“validate PD endpoints failed”] [err=“Other("[components/pd_client/src/util.rs:599]: PD cluster failed to respond")”]

检查pd状态,看端口通不通

是端口不通

ubuntu
service ufw stop
centos
service firewalld stop
service iptables stop

看日志是端口问题
1、安装前有没有执行tiup cluster check 进行预检?这个可以检查集群存在的潜在风险,包括磁盘、防火墙、端口等。
2、selinux关了吗?
setenforce 0,临时关闭,立即生效。
如果你是只改了selinux配置,重启系统生效。
3、查看端口是否占用
lsof -i:20161
其它端口都看一下,我遇见过9090端口被系统进程占用的情况,如果是,可以考虑换端口

三个tikv主机上分别telnet 175.27.241.31 2379试一下

1 个赞

再看下pd的日志。

tiup cluster check 下,看看结果是啥

PD节点是有问题的,先排查pd是否正常运行,在排查是否网络问题。