鲲鹏云 - tiup 离线安装 tidb 集群问题

  • [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false SSHTimeout:5 OptTimeout:60 APITimeout:300 IgnoreConfigCheck:false RetainDataRoles:[] RetainDataNodes:[]}
    Starting component pd
    Starting instance pd 172.16.119.97:2379
    Starting instance pd 172.16.3.186:2379
    Starting instance pd 172.16.108.43:2379
    retry error: operation timed out after 1m0s
    pd 172.16.119.97:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance
    retry error: operation timed out after 1m0s
    pd 172.16.3.186:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance
    retry error: operation timed out after 1m0s
    pd 172.16.108.43:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance

Error: failed to start: failed to start pd: pd 172.16.119.97:2379 failed to start: timed out waiting for port 2379 to be started after 1m0s, please check the log of the instance: timed out waiting for port 2379 to be started after 1m0s

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-07-03-10-08-41.log.
Error: run /home/tidb/.tiup/components/cluster/v1.0.7/tiup-cluster (wd:/home/tidb/.tiup/data/S3djCUp) failed: exit status 1

pd_stderr.log (16.9 KB)

你好,请提供下报错节点的 pd 日志 pd.log、pd_stderr.log 和系统日志 /var/log/message

messages (444.0 KB) pd_stderr.log已发,pd.log是没有的

你好,麻烦分别执行下这几个命令

file /hwdata/tidb-deploy/pd-2379/bin/pd-server
file /hwdata/tidb-deploy/tidb-4000/bin/tidb-server
/hwdata/tidb-deploy/tidb-4000/bin/tidb-server

然后我们看一下结果

另外如果方便的话麻烦提供下拓扑文件,就是 tiup cluster deploy 命令后面跟的那个 yaml 文件

[tidb@PMS-HW-tidb-tiflash-01 ~]$ cat topology.yaml 
## Global variables are applied to all deployments and used as the default value of
## the deployments if a specific deployment value is missing.
global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/hwdata/tidb-deploy"
  data_dir: "/hwdata/tidb-data"

server_configs:
  pd:
    replication.enable-placement-rules: true

pd_servers:
  - host: 172.16.108.43
  - host: 172.16.3.186
  - host: 172.16.119.97
tidb_servers:
  - host: 172.16.108.43
  - host: 172.16.3.186
  - host: 172.16.119.97
tikv_servers:
  - host: 172.16.25.24
  - host: 172.16.93.14
  - host: 172.16.83.67
tiflash_servers:
  - host: 172.16.155.85
    data_dir: /hwdata/data1/tiflash/data,/hwdata/data2/tiflash/data
cdc_servers:
  - host: 172.16.108.43
  - host: 172.16.3.186
  - host: 172.16.119.97
monitoring_servers:
  - host: 172.16.155.85
grafana_servers:
  - host: 172.16.155.85
alertmanager_servers:
  - host: 172.16.155.85

更新该问题信息:

  1. 用户下载的版本为 amd64 版本,无法在鲲鹏云上安装
  2. 使用 wget http://download.pingcap.org/tidb-community-server-v4.0.2-linux-arm64.tar.gz 下载 arm64 安装包并按照离线部署流程重新执行,关键步骤为 sh local_install.sh
    否则会出现如下报错:
  3. 在 topology 文件中 global 标签下添加 arch: arm64

结果:

当前集群安装正常

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。