timed out waiting for port 4000 to be started after 2m0s

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】
v5.0.0

【问题描述】
pd、tikv可正常启动,tidb无法通过4000端口启动,如下错误:

系统环境

debian,uname -a如下:

Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64 GNU/Linux

虚拟机

virtual box

部署步骤

1、参照(第二种:使用 TiUP cluster 在单机上模拟生产环境部署步骤
2、topo.yml仅在vi下使用以下命令修改了host,其余无改动:

:%s/10.0.1.1/10.0.2.15/g

文件内容如下:

global:
 user: "tidb"
 ssh_port: 22
 deploy_dir: "/tidb-deploy"
 data_dir: "/tidb-data"

# # Monitored variables are applied to all the machines.
monitored:
 node_exporter_port: 9100
 blackbox_exporter_port: 9115

server_configs:
 tidb:
   log.slow-threshold: 300
 tikv:
   readpool.storage.use-unified-pool: false
   readpool.coprocessor.use-unified-pool: true
 pd:
   replication.enable-placement-rules: true
   replication.location-labels: ["host"]
 tiflash:
   logger.level: "info"

pd_servers:
 - host: 10.0.2.15

tidb_servers:
 - host: 10.0.2.15

tikv_servers:
 - host: 10.0.2.15
   port: 20160
   status_port: 20180
   config:
     server.labels: { host: "logic-host-1" }

 - host: 10.0.2.15
   port: 20161
   status_port: 20181
   config:
     server.labels: { host: "logic-host-2" }

 - host: 10.0.2.15
   port: 20162
   status_port: 20182
   config:
     server.labels: { host: "logic-host-3" }

tiflash_servers:
 - host: 10.0.2.15

monitoring_servers:
 - host: 10.0.2.15

grafana_servers:
 - host: 10.0.2.15

与官网相比,少了开头俩句注释:

# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.

后尝试加上这俩句,依旧报错

3、安装ssh
发现未安装ssh,执行

apt-get install ssh

sshd_config如下:

# Package generated configuration file
# See the sshd_config(5) manpage for details

# What ports, IPs and protocols we listen for
Port 22
# Use these options to restrict which interfaces/protocols sshd will bind to
#ListenAddress ::
#ListenAddress 0.0.0.0
Protocol 2
# HostKeys for protocol version 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
HostKey /etc/ssh/ssh_host_ecdsa_key
HostKey /etc/ssh/ssh_host_ed25519_key
#Privilege Separation is turned on for security
UsePrivilegeSeparation yes

# Lifetime and size of ephemeral version 1 server key
KeyRegenerationInterval 3600
ServerKeyBits 1024

# Logging
SyslogFacility AUTH
LogLevel INFO

# Authentication:
LoginGraceTime 120
PermitRootLogin yes
StrictModes yes

RSAAuthentication yes
PubkeyAuthentication yes
#AuthorizedKeysFile     %h/.ssh/authorized_keys

# Don't read the user's ~/.rhosts and ~/.shosts files
IgnoreRhosts yes
# For this to work you will also need host keys in /etc/ssh_known_hosts
RhostsRSAAuthentication no
# similar for protocol version 2
HostbasedAuthentication no
# Uncomment if you don't trust ~/.ssh/known_hosts for RhostsRSAAuthentication
#IgnoreUserKnownHosts yes
# To enable empty passwords, change to yes (NOT RECOMMENDED)
PermitEmptyPasswords no

# Change to yes to enable challenge-response passwords (beware issues with
# some PAM modules and threads)
ChallengeResponseAuthentication no

# Change to no to disable tunnelled clear text passwords
PasswordAuthentication yes

# Kerberos options
#KerberosAuthentication no
#KerberosGetAFSToken no
#KerberosOrLocalPasswd yes
#KerberosTicketCleanup yes

# GSSAPI options
#GSSAPIAuthentication no
#GSSAPICleanupCredentials yes

X11Forwarding yes
X11DisplayOffset 10
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
#UseLogin no

#MaxStartups 10:30:60
#Banner /etc/issue.net

# Allow client to pass locale environment variables
AcceptEnv LANG LC_*

Subsystem sftp /usr/lib/openssh/sftp-server

# Set this to 'yes' to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PasswordAuthentication.  Depending on your PAM configuration,
# PAM authentication via ChallengeResponseAuthentication may bypass
# the setting of "PermitRootLogin without-password".
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and ChallengeResponseAuthentication to 'no'.
UsePAM yes
MaxSessions 20
MaxStartups 20

4、执行以下命令

tiup cluster start my-tidb-cluster

报错如图:

尝试解决

1、查看日志
进入以下目录/tidb-deploy/tidb-4000/log,结果如下:
tidb%E5%90%AF%E5%8A%A8%E6%8A%A5%E9%94%991
可以看到文件内容为空,无日志产生
debug文件中error信息如下:

2021-04-28T10:18:55.416+0800    DEBUG   retry error: operation timed out after 2m0s
2021-04-28T10:18:55.585+0800    DEBUG   TaskFinish      {"task": "StartCluster", "error": "failed to start tidb: failed to start: tidb 10.0.2.15:4000, please check the instance's log(/tidb-deploy/tidb-4000/log) for more detail.: timed out waiting for port 4000 to be started after 2m0s", "errorVerbose": "timed out waiting for port 4000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:114\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:145\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:363\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:484\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371\nfailed to start: tidb 10.0.2.15:4000, please check the instance's log(/tidb-deploy/tidb-4000/log) for more detail.\nfailed to start tidb"}
2021-04-28T10:18:55.647+0800    INFO    Execute command finished        {"code": 1, "error": "failed to start tidb: failed to start: tidb 10.0.2.15:4000, please check the instance's log(/tidb-deploy/tidb-4000/log) for more detail.: timed out waiting for port 4000 to be started after 2m0s", "errorVerbose": "timed out waiting for port 4000 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:114\ngithub.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:145\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:363\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:484\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371\nfailed to start: tidb 10.0.2.15:4000, please check the instance's log(/tidb-deploy/tidb-4000/log) for more detail.\nfailed to start tidb"}

对/tidb-deploy/tikv-20160/log/tikv_stderr.log,执行以下命令:

tail -n 100 tikv_stderr.log

结果:

<jemalloc>: Malformed conf string
<jemalloc>: Malformed conf string

故怀疑配置有问题,执行

tiup cluster edit-config my-tidb-cluster

输出结果

global:
  user: tidb
  ssh_port: 22
  ssh_type: builtin
  deploy_dir: /tidb-deploy
  data_dir: /tidb-data
  os: linux
  arch: amd64
monitored:
  node_exporter_port: 9100
  blackbox_exporter_port: 9115
  deploy_dir: /tidb-deploy/monitor-9100
  data_dir: /tidb-data/monitor-9100
  log_dir: /tidb-deploy/monitor-9100/log
server_configs:
  tidb:
    log.slow-threshold: 300
  tikv:
    readpool.coprocessor.use-unified-pool: true
    readpool.storage.use-unified-pool: false
  pd:
    replication.enable-placement-rules: true
    replication.location-labels:
    - host
  tiflash:
    logger.level: info
  tiflash-learner: {}
  pump: {}
  drainer: {}
  cdc: {}
tidb_servers:
- host: 10.0.2.15
  ssh_port: 22
  port: 4000
  status_port: 10080
  deploy_dir: /tidb-deploy/tidb-4000
  arch: amd64
  os: linux
tikv_servers:
- host: 10.0.2.15
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /tidb-deploy/tikv-20160
  data_dir: /tidb-data/tikv-20160
  config:
    server.labels:
      host: logic-host-1
  arch: amd64
  os: linux
- host: 10.0.2.15
  ssh_port: 22
  port: 20161
  status_port: 20181
  deploy_dir: /tidb-deploy/tikv-20161
  data_dir: /tidb-data/tikv-20161
  config:
    server.labels:
      host: logic-host-2
  arch: amd64
  os: linux
- host: 10.0.2.15
  ssh_port: 22
  port: 20162
  status_port: 20182
  deploy_dir: /tidb-deploy/tikv-20162
  data_dir: /tidb-data/tikv-20162
  config:
    server.labels:
      host: logic-host-3
  arch: amd64
  os: linux
tiflash_servers:
- host: 10.0.2.15
  ssh_port: 22
  tcp_port: 9000
  http_port: 8123
  flash_service_port: 3930
  flash_proxy_port: 20170
  flash_proxy_status_port: 20292
  metrics_port: 8234
  deploy_dir: /tidb-deploy/tiflash-9000
  data_dir: /tidb-data/tiflash-9000
  arch: amd64
  os: linux
pd_servers:
- host: 10.0.2.15
  ssh_port: 22
  name: pd-10.0.2.15-2379
  client_port: 2379
  peer_port: 2380
  deploy_dir: /tidb-deploy/pd-2379
  data_dir: /tidb-data/pd-2379
  arch: amd64
  os: linux
monitoring_servers:
- host: 10.0.2.15
  ssh_port: 22
  port: 9090
  deploy_dir: /tidb-deploy/prometheus-9090
  data_dir: /tidb-data/prometheus-9090
  external_alertmanagers: []
  arch: amd64
  os: linux
grafana_servers:
- host: 10.0.2.15
  ssh_port: 22
  port: 3000
  deploy_dir: /tidb-deploy/grafana-3000
  arch: amd64
  os: linux
  username: admin
  password: admin
  anonymous_enable: false
  root_url: ""
  domain: ""

眼拙,未能看出异常

2、切换版本
切成v4.0.8,同样错误

3、切换系统
切换成centos8,同样错误


4、防火墙和Selinux
4.1、bebian
4.1.1、防火墙
执行

service iptables status

输出

● iptables.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

执行

service ufw status

输出

● ufw.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

4.1.2、Selinux
执行

getenforce

结果

-bash: getenforce: command not found

安装

sudo apt-get install -y selinux-utils setools

结果

Reading package lists... Done
Building dependency tree       
Reading state information... Done
E: Unable to locate package selinux-utils
E: Unable to locate package setools

小白未能找到安装方法

4.2、centos执行

4.2.1、防火墙

systemctl stop firewalld.service

正常关闭防火墙,但依旧报错

4.2.2、Selinux

未尝试,先发帖提问


5、问答社区和GitHub issue
搜索

timed out waiting for port 4000 to be started after 2m0s

对应解决方案均尝试,无效

总结

跪求大佬指点一二

看起来是单机安装吧,参考下这个帖子,另外如果还报错,改个别的端口试试。

使用tiup playground出现如下报错:cpu.cfs_quota_us: no such file or directory,经百度,应该是系统问题,安装debian10之后成功,感谢回复