使用tiup扩容cdc报错

shijk · 2020 年6 月 5 日 03:08

按照官方文档扩容cdc
https://pingcap.com/docs-cn/stable/scale-tidb-using-tiup/
报错信息如下：
[root@tidb ~]# tiup cluster scale-out tidb-test ./scale-out.yaml
Starting component cluster: /root/.tiup/components/cluster/v0.6.2/cluster scale-out tidb-test ./scale-out.yaml
Please confirm your topology:
TiDB Cluster: tidb-test
TiDB Version: v4.0.0-rc.2
Type Host Ports OS/Arch Directories

cdc 10.186.61.2 8300 linux/x86_64 deploy/cdc-8300
Attention:
1. If the topology is not what you expected, check your yaml file.
2. Please confirm there is no port/directory conflicts in same host.
Do you want to continue? [y/N]: y

[ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/tidb-test/ssh/id_rsa.pub
- Download node_exporter:v0.17.0 (linux/amd64) … Done
[ Serial ] - RootSSH: user=root, host=10.186.61.2, port=22, key=/root/.ssh/id_rsa
[ Serial ] - EnvInit: user=tidb, host=10.186.61.2
[ Serial ] - Mkdir: host=10.186.61.2, directories=‘/tidb-deploy’,‘/tidb-data’
[Parallel] - UserSSH: user=tidb, host=10.186.61.51
[Parallel] - UserSSH: user=tidb, host=10.186.61.51
[Parallel] - UserSSH: user=tidb, host=10.186.61.51
[Parallel] - UserSSH: user=tidb, host=10.186.61.51
[Parallel] - UserSSH: user=tidb, host=10.186.61.37
[Parallel] - UserSSH: user=tidb, host=10.186.61.39
[Parallel] - UserSSH: user=tidb, host=10.186.61.37
[Parallel] - UserSSH: user=tidb, host=10.186.61.42
[Parallel] - UserSSH: user=tidb, host=10.186.61.51
[Parallel] - UserSSH: user=tidb, host=10.186.61.51
[Parallel] - UserSSH: user=tidb, host=10.186.61.37
[ Serial ] - UserSSH: user=tidb, host=10.186.61.2
[ Serial ] - Mkdir: host=10.186.61.2, directories=‘/home/tidb/deploy/cdc-8300’,‘/home/tidb/deploy/cdc-8300/log’,‘/home/tidb/deploy/cdc-8300/bin’,'/home/tidb/
- Copy node_exporter → 10.186.61.2 … ⠹ CopyComponent: component=node_exporter, version=v0.17.0, remote=10.186.61.2:/tidb-deploy/monitor-9100 os=linux, …
- Copy node_exporter → 10.186.61.2 … ⠼ CopyComponent: component=node_exporter, version=v0.17.0, remote=10.186.61.2:/tidb-deploy/monitor-9100 os=linux, …
- Copy node_exporter → 10.186.61.2 … Done
[ Serial ] - ScaleConfig: cluster=tidb-test, user=tidb, host=10.186.61.2, service=cdc-8300.service, deploy_dir=/home/tidb/deploy/cdc-8300, data_dir=[/home/tidb], log_dir=/home/tidb/deploy/cdc-8300/log, cache_dir=
[ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false SSHTimeout:0 OptTimeout:60 APITimeout:0}
Starting component pd
Starting instance pd 10.186.61.51:2379
Start pd 10.186.61.51:2379 success
Starting component node_exporter
Starting instance 10.186.61.51
Start 10.186.61.51 success
Starting component blackbox_exporter
Starting instance 10.186.61.51
Start 10.186.61.51 success
Starting component tikv
Starting instance tikv 10.186.61.42:20160
Starting instance tikv 10.186.61.37:20160
Starting instance tikv 10.186.61.39:20160
Start tikv 10.186.61.42:20160 success
Start tikv 10.186.61.37:20160 success
Start tikv 10.186.61.39:20160 success
Starting component node_exporter
Starting instance 10.186.61.37
Start 10.186.61.37 success
Starting component blackbox_exporter
Starting instance 10.186.61.37
Start 10.186.61.37 success
Starting component node_exporter
Starting instance 10.186.61.39
Start 10.186.61.39 success
Starting component blackbox_exporter
Starting instance 10.186.61.39
Start 10.186.61.39 success
Starting component node_exporter
Starting instance 10.186.61.42
Start 10.186.61.42 success
Starting component blackbox_exporter
Starting instance 10.186.61.42
Start 10.186.61.42 success
Starting component tidb
Starting instance tidb 10.186.61.51:4000
Start tidb 10.186.61.51:4000 success
Starting component tiflash
Starting instance tiflash 10.186.61.51:9000
Start tiflash 10.186.61.51:9000 success
Starting component cdc
Starting instance cdc 10.186.61.37:11111
Starting instance cdc 10.186.61.37:8300
Start cdc 10.186.61.37:11111 success
Start cdc 10.186.61.37:8300 success
Starting component prometheus
Starting instance prometheus 10.186.61.51:9090
Start prometheus 10.186.61.51:9090 success
Starting component grafana
Starting instance grafana 10.186.61.51:3000
Start grafana 10.186.61.51:3000 success
Starting component alertmanager
Starting instance alertmanager 10.186.61.51:9093
Start alertmanager 10.186.61.51:9093 success
Checking service state of pd
10.186.61.51 Active: active (running) since Mon 2020-05-25 02:40:56 UTC; 1 weeks 4 days ago
Checking service state of tikv
10.186.61.37 Active: active (running) since Mon 2020-05-25 02:40:48 UTC; 1 weeks 4 days ago
10.186.61.42 Active: active (running) since Mon 2020-05-25 02:40:54 UTC; 1 weeks 4 days ago
10.186.61.39 Active: active (running) since Mon 2020-05-25 02:40:50 UTC; 1 weeks 4 days ago
Checking service state of tidb
10.186.61.51 Active: active (running) since Mon 2020-05-25 02:40:56 UTC; 1 weeks 4 days ago
Checking service state of tiflash
10.186.61.51 Active: active (running) since Tue 2020-05-26 09:32:58 UTC; 1 weeks 2 days ago
Checking service state of cdc
10.186.61.37 Active: activating (auto-restart) (Result: exit-code) since Fri 2020-06-05 03:02:07 UTC; 1s ago
10.186.61.37 Active: activating (auto-restart) (Result: exit-code) since Fri 2020-06-05 03:02:07 UTC; 1s ago
Checking service state of prometheus
10.186.61.51 Active: active (running) since Mon 2020-05-25 02:40:56 UTC; 1 weeks 4 days ago
Checking service state of grafana
10.186.61.51 Active: active (running) since Mon 2020-05-25 02:40:56 UTC; 1 weeks 4 days ago
Checking service state of alertmanager
10.186.61.51 Active: active (running) since Mon 2020-05-25 02:40:56 UTC; 1 weeks 4 days ago
[Parallel] - UserSSH: user=tidb, host=10.186.61.2
[ Serial ] - save meta
[ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false SSHTimeout:0 OptTimeout:60 APITimeout:0}
Starting component cdc
Starting instance cdc 10.186.61.2:8300
retry error: operation timed out after 1m0s
cdc 10.186.61.2:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance

Error: failed to start: failed to start cdc: cdc 10.186.61.2:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance: timed out waiting for port 8300 to be started after 1m0s

Verbose debug logs has been written to /root/logs/tiup-cluster-debug-2020-06-05-03-03-10.log.
Error: run /root/.tiup/components/cluster/v0.6.2/cluster (wd:/root/.tiup/data/S10E86d) failed: exit status 1

端口占用情况如下：

日志内容如下：

shijk · 2020 年6 月 5 日 03:09

补充：selinux关闭，无防火墙和iptables，两台机器之间免密登录

GangShen · 2020 年6 月 5 日 03:28

麻烦提供一下 cdc 的日志，看下启动失败的报错是怎么样的

shijk · 2020 年6 月 5 日 03:30

cdc.log (44.9 KB)
cdc_stderr.log (95.7 KB)

GangShen · 2020 年6 月 5 日 04:05

查看 cdc_stderr.log

Error: can not load timezone, Please specify the time zone through environment variable `TZ` or command line parameters `--tz`: unknown time zone zoneinfo/UTC
Usage:
  cdc server [flags]

Flags:
      --addr string             Set the listening address (default "0.0.0.0:8300")
      --advertise-addr string   Set the advertise listening address for client communication
      --gc-ttl int              CDC GC safepoint TTL duration, specified in seconds (default 86400)
  -h, --help                    help for server
      --pd ,                    Set the PD endpoints to use. Use , to separate multiple PDs (default "http://127.0.0.1:2379")
      --tz string               Specify time zone of TiCDC cluster (default "System")

Global Flags:
      --log-file string    log file path (default "cdc.log")
      --log-level string   log level (etc: debug|info|warn|error) (default "info")

麻烦检查一下时区设置是怎么样的？

shijk · 2020 年6 月 5 日 07:28

原本这两台机器的时区不是东八区，然后我更改了时区，还是不行

shijk · 2020 年6 月 5 日 07:28

报错信息更新

GangShen · 2020 年6 月 5 日 07:35

麻烦在 CDC 机器上执行一下以下命令看下

echo $TZ

ll /etc/localtime

shijk · 2020 年6 月 5 日 07:36

GangShen · 2020 年6 月 5 日 08:25

错误信息代表当时CDC的机器上 /etc/localtime 的 time zone 是 UTC 的。

可以尝试以下其中一个操作：

设置 $TZ=UTC 或 $TZ=Asia/Shanghai
在启动脚本中传入 --tz UTC 或 --tz Asia/Shanghai
要保留 /etc/localtime 是 UTC 时区的話，可以指向 /usr/share/zoneinfo/Etc/UTC 而非 /usr/share/zoneinfo/UTC

system · 2022 年10 月 31 日 19:12

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。