单机集群部署启动报错retry error: operation timed out after 2m0s

一.报错如下:
[root@ ~]# tiup cluster start tidb-test
Starting component cluster: /root/.tiup/components/cluster/v1.0.9/tiup-cluster start tidb-test
Starting cluster tidb-test…

  • [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [Parallel] - UserSSH: user=tidb, host=192.168.5.87
  • [ Serial ] - StartCluster
    Starting component pd
    Starting instance pd 192.168.5.87:2379
    Start pd 192.168.5.87:2379 success
    Starting component node_exporter
    Starting instance 192.168.5.87
    Start 192.168.5.87 success
    Starting component blackbox_exporter
    Starting instance 192.168.5.87
    Start 192.168.5.87 success
    Starting component tikv
    Starting instance tikv 192.168.5.87:20162
    Starting instance tikv 192.168.5.87:20160
    Starting instance tikv 192.168.5.87:20161
    Start tikv 192.168.5.87:20160 success
    Start tikv 192.168.5.87:20161 success
    Start tikv 192.168.5.87:20162 success
    Starting component tidb
    Starting instance tidb 192.168.5.87:4000
    Start tidb 192.168.5.87:4000 success
    Starting component tiflash
    Starting instance tiflash 192.168.5.87:9000
    retry error: operation timed out after 2m0s
    tiflash 192.168.5.87:9000 failed to start: timed out waiting for port 9000 to be started after 2m0s, please check the log of the instance

Error: failed to start tiflash: tiflash 192.168.5.87:9000 failed to start: timed out waiting for port 9000 to be started after 2m0s, please check the log of the instance: timed out waiting for port 9000 to be started after 2m0s

Verbose debug logs has been written to /root/logs/tiup-cluster-debug-2020-08-13-15-39-56.log.
Error: run /root/.tiup/components/cluster/v1.0.9/tiup-cluster (wd:/root/.tiup/data/S7Wnwhn) failed: exit status 1

二.store信息
[root@ ~]# tiup ctl pd -u 192.168.5.87:2379 store
Starting component ctl: /root/.tiup/components/ctl/v4.0.4/ctl pd -u 192.168.5.87:2379 store
{
“count”: 3,
“stores”: [
{
“store”: {
“id”: 4,
“address”: “192.168.5.87:20160”,
“version”: “4.0.4”,
“status_address”: “192.168.5.87:20180”,
“git_hash”: “28e3d44b00700137de4fa933066ab83e5f8306cf”,
“start_timestamp”: 1597301113,
“deploy_path”: “/tidb-deploy/tikv-20160/bin”,
“last_heartbeat”: 1597307714815077900,
“state_name”: “Up”
},
“status”: {
“capacity”: “49.98GiB”,
“available”: “22.23GiB”,
“used_size”: “31.7MiB”,
“leader_count”: 4,
“leader_weight”: 1,
“leader_score”: 4,
“leader_size”: 4,
“region_count”: 21,
“region_weight”: 1,
“region_score”: 21,
“region_size”: 21,
“start_ts”: “2020-08-13T14:45:13+08:00”,
“last_heartbeat_ts”: “2020-08-13T16:35:14.8150779+08:00”,
“uptime”: “1h50m1.8150779s”
}
},
{
“store”: {
“id”: 7,
“address”: “192.168.5.87:20162”,
“version”: “4.0.4”,
“status_address”: “192.168.5.87:20182”,
“git_hash”: “28e3d44b00700137de4fa933066ab83e5f8306cf”,
“start_timestamp”: 1597301450,
“deploy_path”: “/tidb-deploy/tikv-20162/bin”,
“last_heartbeat”: 1597307721178571119,
“state_name”: “Up”
},
“status”: {
“capacity”: “49.98GiB”,
“available”: “22.23GiB”,
“used_size”: “31.69MiB”,
“leader_count”: 4,
“leader_weight”: 1,
“leader_score”: 4,
“leader_size”: 4,
“region_count”: 21,
“region_weight”: 1,
“region_score”: 21,
“region_size”: 21,
“start_ts”: “2020-08-13T14:50:50+08:00”,
“last_heartbeat_ts”: “2020-08-13T16:35:21.178571119+08:00”,
“uptime”: “1h44m31.178571119s”
}
},
{
“store”: {
“id”: 1,
“address”: “192.168.5.87:20161”,
“version”: “4.0.4”,
“status_address”: “192.168.5.87:20181”,
“git_hash”: “28e3d44b00700137de4fa933066ab83e5f8306cf”,
“start_timestamp”: 1597301113,
“deploy_path”: “/tidb-deploy/tikv-20161/bin”,
“last_heartbeat”: 1597307714860362352,
“state_name”: “Up”
},
“status”: {
“capacity”: “49.98GiB”,
“available”: “22.23GiB”,
“used_size”: “32.62MiB”,
“leader_count”: 13,
“leader_weight”: 1,
“leader_score”: 13,
“leader_size”: 13,
“region_count”: 21,
“region_weight”: 1,
“region_score”: 21,
“region_size”: 21,
“start_ts”: “2020-08-13T14:45:13+08:00”,
“last_heartbeat_ts”: “2020-08-13T16:35:14.860362352+08:00”,
“uptime”: “1h50m1.860362352s”
}
}
]
}

三剩余内存
total used free shared buff/cache available
Mem: 32004 25597 4724 97 1682 5845
Swap: 0 0 0

cpu:
%Cpu0 : 3.7 us, 1.0 sy, 1.0 ni, 93.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.7 st
%Cpu1 : 3.1 us, 1.7 sy, 1.0 ni, 93.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.7 st
%Cpu2 : 3.4 us, 1.4 sy, 0.3 ni, 94.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.7 st
%Cpu3 : 3.1 us, 2.0 sy, 1.0 ni, 93.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.7 st
KiB Mem : 32772568 total, 4262264 free, 26649536 used, 1860768 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 5549404 avail Mem

看下 cpu 使用率,这边看 store 没有 tiflash 信息,,并且上传下 tifalsh log 下所有文件,

tiflash.log (88.4 KB) tiflash_tikv.log (18.9 KB)

确认一下 TiFlash 的服务端口是否有冲突,看起来是创建连接超时,导致 TiFlash 启动失败

2020.08.13 14:55:37.456705 [ 1 ] <Information> Application: Flash service registered
2020.08.13 14:55:37.456711 [ 1 ] <Information> Application: Diagnostics service registered
2020.08.13 14:55:37.456737 [ 1 ] <Information> grpc: /root/grpc/src/cpp/server/server_builder.cc, line number : 309, log msg : Synchronous server. Num CQs: 1, Min pollers: 1, Max Pollers: 2, CQ timeout (msec): 10000
2020.08.13 14:55:37.457345 [ 1 ] <Information> Application: Flash grpc server listening on [192.168.5.87:3930]
2020.08.13 14:55:37.464411 [ 1 ] <Information> Application: Begin to shut down flash grpc server
2020.08.13 14:55:37.465256 [ 1 ] <Information> Application: Shut down flash grpc server
2020.08.13 14:55:37.465271 [ 1 ] <Information> Application: Begin to shut down flash service
2020.08.13 14:55:37.465296 [ 1 ] <Information> Application: Shut down flash service
2020.08.13 14:55:37.465305 [ 1 ] <Information> Application: Shutting down storages.

[root@ ~]# tiup cluster display tidb-test
Starting component cluster: /root/.tiup/components/cluster/v1.0.9/tiup-cluster display tidb-test
tidb Cluster: tidb-test
tidb Version: v4.0.4
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


192.168.5.87:3000 grafana 192.168.5.87 3000 linux/x86_64 inactive - /tidb-deploy/grafana-3000
192.168.5.87:2379 pd 192.168.5.87 2379/2380 linux/x86_64 Up|L|UI /tidb-data/pd-2379 /tidb-deploy/pd-2379
192.168.5.87:9090 prometheus 192.168.5.87 9090 linux/x86_64 inactive /tidb-data/prometheus-9090 /tidb-deploy/prometheus-9090
192.168.5.87:4000 tidb 192.168.5.87 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb-4000
192.168.5.87:9000 tiflash 192.168.5.87 9000/8123/3930/20170/20292/8234 linux/x86_64 N/A /tidb-data/tiflash-9000 /tidb-deploy/tiflash-9000
192.168.5.87:20160 tikv 192.168.5.87 20160/20180 linux/x86_64 Up /tidb-data/tikv-20160 /tidb-deploy/tikv-20160
192.168.5.87:20161 tikv 192.168.5.87 20161/20181 linux/x86_64 Up /tidb-data/tikv-20161 /tidb-deploy/tikv-20161
192.168.5.87:20162 tikv 192.168.5.87 20162/20182 linux/x86_64 Up /tidb-data/tikv-20162 /tidb-deploy/tikv-20162
[root@streaming-03 ~]# netstat -tulnp | grep 9000
[root@streaming-03 ~]# netstat -tulnp | grep 3000
[root@streaming-03 ~]# netstat -tulnp | grep 3930
[root@-streaming-03 ~]# netstat -tulnp | grep 8123
tcp6 0 0 :::8123 :::* LISTEN 1765/clickhouse-ser

从上面的命令看,8123 端口似乎被占用了?

tiflash的端口可以自己定义吗

https://docs.pingcap.com/zh/tidb/stable/tiflash-configuration#tiflash-配置参数 可以参考该文档修改配置,8123 应该是 http_port