tidb Version: v4.0.4 PD启动超时

正常使用中突然tikv、tiflash自动停止,尝试重启无果,重启机器后,再缩容pd后扩容pd,还是启动超时,tikv、tiflash已经正常启动。

pd 10.10.23.91:2379 failed to start: timed out waiting for port 2379 to be started after 2m0s, please check the log of the instance

pd_stderr.log 日志为空

提供下以下信息
tidb 版本
pd.log 中是否有信息,目前集群节点存活情况。

tidb Cluster: tidb-cluster-v4
tidb Version: v4.0.4
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


10.10.23.91:9093 alertmanager 10.10.23.91 9093/9094 linux/x86_64 Up /data/tidb-data/alertmanager-9093 /home/tidb/tidb-deploy/alertmanager-9093
10.10.23.95:8300 cdc 10.10.23.95 8300 linux/x86_64 Up - /home/tidb/tidb-deploy/cdc-8300
10.10.23.97:8300 cdc 10.10.23.97 8300 linux/x86_64 Up - /home/tidb/tidb-deploy/cdc-8300
10.10.23.91:3000 grafana 10.10.23.91 3000 linux/x86_64 Up - /home/tidb/tidb-deploy/grafana-3000
10.10.23.91:2379 pd 10.10.23.91 2379/2380 linux/x86_64 Down /data/tidb-data/pd-2379 /data/tidb/tidb-deploy/pd-2379
10.10.23.95:2379 pd 10.10.23.95 2379/2380 linux/x86_64 Up|UI /data/tidb-data/pd-2379 /home/tidb/tidb-deploy/pd-2379
10.10.23.97:2379 pd 10.10.23.97 2379/2380 linux/x86_64 Up|L /data/tidb-data/pd-2379 /home/tidb/tidb-deploy/pd-2379
10.10.23.91:9090 prometheus 10.10.23.91 9090 linux/x86_64 Up /data/tidb-data/prometheus-9090 /home/tidb/tidb-deploy/prometheus-9090
10.10.23.91:4000 tidb 10.10.23.91 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
10.10.23.95:4000 tidb 10.10.23.95 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
10.10.23.97:4000 tidb 10.10.23.97 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
10.10.23.91:9000 tiflash 10.10.23.91 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /data/tidb-data/tiflash-9000 /home/tidb/tidb-deploy/tiflash-9000
10.10.23.95:9000 tiflash 10.10.23.95 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /data/tidb-data/tiflash-9000 /home/tidb/tidb-deploy/tiflash-9000
10.10.23.97:9000 tiflash 10.10.23.97 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /data/tidb-data/tiflash-9000 /home/tidb/tidb-deploy/tiflash-9000
10.10.23.91:20160 tikv 10.10.23.91 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
10.10.23.95:20160 tikv 10.10.23.95 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
10.10.23.97:20160 tikv 10.10.23.97 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160

pd.log 文件都没有,只有pd_stderr.log 是空的

扩容pd的配置文件 scale-out.yaml

pd_servers:

  • host: 10.10.23.91
    ssh_port: 22
    name: pd
    client_port: 2379
    peer_port: 2380
    deploy_dir: /data/tidb/tidb-deploy/pd-2379
    data_dir: /data/tidb-data/pd-2379
    log_dir: /data/tidb-data/pd-2379/log

执行 script/run_pd.sh 看下日志情况。

[2020/08/24 18:23:28.287 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=56c195ccc4c0b93] [local-member-attributes="{Name:pd ClientURLs:[http://10.10.23.91:2379]}"] [request-path=/0/members/56c195ccc4c0b93/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]
[2020/08/24 18:23:39.288 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=56c195ccc4c0b93] [local-member-attributes="{Name:pd ClientURLs:[http://10.10.23.91:2379]}"] [request-path=/0/members/56c195ccc4c0b93/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]
[2020/08/24 18:23:50.288 +08:00] [WARN] [server.go:2045] [“failed to publish local member to cluster through raft”] [local-member-id=56c195ccc4c0b93] [local-member-attributes="{Name:pd ClientURLs:[http://10.10.23.91:2379]}"] [request-path=/0/members/56c195ccc4c0b93/attributes] [publish-timeout=11s] [error=“etcdserver: request timed out”]

换了log_dir 可以启动,但是日志报错

重启了机器,好了

:+1: