TICDC 启动超时

Hacker_SeEPBc31 · 2020 年5 月 21 日 11:03

现状：
TiDB Version: v4.0.0-rc
单机部署多实例：1 pd 3 kv 1tidb

正常可以启动和使用。

2.需求需要按爪给你CDC组件
当在现有集群通过tiup工具的scale-out命令扩容cdc
配置文件如下：

cdc_servers:

host: 192.168.180.231
port: 8300
host: 192.168.180.231
port: 8301
host: 192.168.180.231
port: 8302
3.执行扩容命令后：
Download cdc:v4.0.0-rc … Done

[ Serial ] - UserSSH: user=tidb, host=192.168.180.231
[ Serial ] - UserSSH: user=tidb, host=192.168.180.231
[ Serial ] - Mkdir: host=192.168.180.231, directories=‘/home/tidb/deploy/cdc-8300’,‘’,‘/home/tidb/deploy/cdc-8300/log’,‘/home/tidb/deploy/cdc-8300/bin’,‘/home/tidb/deploy/cdc-8300/conf’,‘/home/tidb/deploy/cdc-8300/scripts’
[ Serial ] - Mkdir: host=192.168.180.231, directories=‘/home/tidb/deploy/cdc-8302’,‘’,‘/home/tidb/deploy/cdc-8302/log’,‘/home/tidb/deploy/cdc-8302/bin’,‘/home/tidb/deploy/cdc-8302/conf’,‘/home/tidb/deploy/cdc-8302/scripts’
[ Serial ] - UserSSH: user=tidb, host=192.168.180.231
[ Serial ] - Mkdir: host=192.168.180.231, directories=‘/home/tidb/deploy/cdc-8301’,‘’,‘/home/tidb/deploy/cdc-8301/log’,‘/home/tidb/deploy/cdc-8301/bin’,‘/home/tidb/deploy/cdc-8301/conf’,‘/home/tidb/deploy/cdc-8301/scripts’
[ Serial ] - CopyComponent: component=cdc, version=v4.0.0-rc, remote=192.168.180.231:/home/tidb/deploy/cdc-8301
[ Serial ] - CopyComponent: component=cdc, version=v4.0.0-rc, remote=192.168.180.231:/home/tidb/deploy/cdc-8302
[ Serial ] - CopyComponent: component=cdc, version=v4.0.0-rc, remote=192.168.180.231:/home/tidb/deploy/cdc-8300
[ Serial ] - ScaleConfig: cluster=tidb-test, user=tidb, host=192.168.180.231, service=cdc-8301.service, deploy_dir=/home/tidb/deploy/cdc-8301, data_dir=, log_dir=/home/tidb/deploy/cdc-8301/log, cache_dir=
[ Serial ] - ScaleConfig: cluster=tidb-test, user=tidb, host=192.168.180.231, service=cdc-8302.service, deploy_dir=/home/tidb/deploy/cdc-8302, data_dir=, log_dir=/home/tidb/deploy/cdc-8302/log, cache_dir=
[ Serial ] - ScaleConfig: cluster=tidb-test, user=tidb, host=192.168.180.231, service=cdc-8300.service, deploy_dir=/home/tidb/deploy/cdc-8300, data_dir=, log_dir=/home/tidb/deploy/cdc-8300/log, cache_dir=
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0}
Starting component pd
Starting instance pd 192.168.180.231:2379
Start pd 192.168.180.231:2379 success
Starting component node_exporter
Starting instance 192.168.180.231
Start 192.168.180.231 success
Starting component blackbox_exporter
Starting instance 192.168.180.231
Start 192.168.180.231 success
Starting component tikv
Starting instance tikv 192.168.180.231:20162
Starting instance tikv 192.168.180.231:20161
Starting instance tikv 192.168.180.231:20160
Start tikv 192.168.180.231:20162 success
Start tikv 192.168.180.231:20160 success
Start tikv 192.168.180.231:20161 success
Starting component tidb
Starting instance tidb 192.168.180.231:4000
Start tidb 192.168.180.231:4000 success
Starting component tiflash
Starting instance tiflash 192.168.180.231:9000
Start tiflash 192.168.180.231:9000 success
Starting component prometheus
Starting instance prometheus 192.168.180.231:9090
Start prometheus 192.168.180.231:9090 success
Starting component grafana
Starting instance grafana 192.168.180.231:3000
Start grafana 192.168.180.231:3000 success
Checking service state of pd
192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:55 CST; 52min ago
Checking service state of tikv
192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:56 CST; 52min ago
192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:56 CST; 52min ago
192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:56 CST; 52min ago
Checking service state of tidb
192.168.180.231 Active: active (running) since Thu 2020-05-21 17:53:01 CST; 52min ago
Checking service state of tiflash
192.168.180.231 Active: active (running) since Thu 2020-05-21 17:53:14 CST; 52min ago
Checking service state of prometheus
192.168.180.231 Active: active (running) since Thu 2020-05-21 18:45:36 CST; 2s ago
Checking service state of grafana
192.168.180.231 Active: active (running) since Thu 2020-05-21 18:45:37 CST; 2s ago
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[Parallel] - UserSSH: user=tidb, host=192.168.180.231
[ Serial ] - save meta
[ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0}
Starting component cdc
Starting instance cdc 192.168.180.231:8302
Starting instance cdc 192.168.180.231:8300
Starting instance cdc 192.168.180.231:8301
cdc 192.168.180.231:8301 failed to start: timed out waiting for port 8301 to be started after 1m0s, please check the log of the instance
cdc 192.168.180.231:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance
cdc 192.168.180.231:8302 failed to start: timed out waiting for port 8302 to be started after 1m0s, please check the log of the instance

Error: failed to start: failed to start cdc: cdc 192.168.180.231:8301 failed to start: timed out waiting for port 8301 to be started after 1m0s, please check the log of the instance: timed out waiting for port 8301 to be started after 1m0s

Verbose debug logs has been written to /root/logs/tiup-cluster-debug-2020-05-21-18-46-40.log.
Error: run /root/.tiup/components/cluster/v0.6.0/cluster (wd:/root/.tiup/data/RzcOwx2) failed: exit status 1

yilong · 2020 年5 月 21 日 11:46

请反馈日志 /root/logs/tiup-cluster-debug-2020-05-21-18-46-40.log ，多谢

Hacker_SeEPBc31 · 2020 年5 月 21 日 11:59

tiup-cluster-debug-2020-05-21-18-46-40.log (91 KB)
日志见附件

yilong · 2020 年5 月 21 日 13:12

麻烦帮忙采集下 cdc 的日志，多谢。

Hacker_SeEPBc31 · 2020 年5 月 22 日 03:15

cdc_stderr.log (8.2 KB)
cdc日志 Error: unknown flag: --status-addr
Usage:
cdc server [flags]

安装ticdcd参考： [使用TiUP在必要的TiDB上上添TiCDC组件] 章节，没有涉及到status-addr

yilong · 2020 年5 月 22 日 04:07

1.启动集群时有手工指定 --status-addr吗？如果有，重新启动，尝试不要指定参数

是否有 cdc完整日志，多谢。

Hacker_SeEPBc31 · 2020 年5 月 22 日 04:22

1.启动集群：使用 tiup cluster deploy部署完后，启动只涉及到 tiup cluster start tidb-test命令和扩容tiup cluster scale-out tidb-test /apps/tidb/scale-cdc.yaml命令，没有手工指定手工指定 --status-addr
2.cdc完整日志？目前cdc的log目录只有 [cdc_stderr.log]，已经全部上传？还需要哪写日志，比如相对的路径下日志？
看了下CDC的启动脚本：run_cdc.sh里面是有status-addr这个参数的，是不是 tiup的启动命令，调用run_cdc.sh造成的？
run_cdc.sh (402 字节)

Hacker_SeEPBc31 · 2020 年5 月 22 日 08:01

run_cdc.sh:

#!/bin/bash set -e

WARNING: This file was auto-generated. Do not edit!

All your edit might be overwritten!

DEPLOY_DIR=/apps/tidb-deploy/cdc-8300 cd “${DEPLOY_DIR}” || exit 1 exec bin/cdc server
–status-addr “192.168.180.231:8300”
–pd “http://192.168.180.231:2379”
–log-file “/apps/tidb-deploy/cdc-8300/log/cdc.log” 2>> “/apps/tidb-deploy/cdc-8300/log/cdc_stderr.log”

yilong · 2020 年5 月 22 日 13:08

请尝试备份脚本
删除–status-addr ，手工启动脚本，看看能否成功，多谢。

Hacker_SeEPBc31 · 2020 年5 月 25 日 00:04

 [cdc_stderr_8300.log|attachment](upload://8LmHoquApdtL4mjeoqrxdk9U6x4.log) (28.5 KB) [cdc_stderr-8301.log|attachment](upload://jntaXZ4j0Q5yoVo4ldijxQLXUrt.log) (34.0 KB) [cdc-8300.log|attachment](upload://qs6G13kIwXJ0pV10xr2e52MV6p1.log) (9.0 KB) [cdc-8301.log|attachment](upload://3lwOmpX3vVVzxYd0PbPKgptKTIR.log) (18.7 KB) [scale-cdc.yaml|attachment](upload://yShL7G0BrmgWMo4gZUuLOXd4CPW.yaml) (408 字节) [tiup-cluster-debug-2020-05-25-07-37-50.log|attachment](upload://l9eVV6bZkza0kFHdK8omvrdApPi.log) (86 KB) 目前针对单机多实例CDC的部署强制修改run_cdc.sh（删除--status-addr参数）,启动后显示启动了一个节点8300（8301和8302失败），

但是从日志看三个节点日志均有同样的错误信息，日志见附件：主要报错[“run server”] [error=“listen tcp 0.0.0.0:8300: bind: address already in use”] 1.是不是配置有误，扩容的配置文件见附件scale-cdc.yaml； 2.错误信息当前CDC的版本是[“Welcome to Change Data Capture (CDC)”] [release-version=v4.0.0-rc.2]，但是当前集群tiup是v4.0.0-rc，扩容创建CDC会拉取最新版本CDC？还是日志显示错误？

Hacker_SeEPBc31 · 2020 年5 月 25 日 00:06

目前针对单机多实例CDC的部署强制修改run_cdc.sh（删除–status-addr参数）,启动后显示启动了一个节点8300（8301和8302失败），
但是从日志看三个节点日志均有同样的错误信息，
日志见附件：主要报错[“run server”] [error=“listen tcp 0.0.0.0:8300: bind: address already in use”]
1.是不是配置有误，扩容的配置文件见附件scale-cdc.yaml；
2.错误信息当前CDC的版本是[“Welcome to Change Data Capture (CDC)”] [release-version=v4.0.0-rc.2]，
但是当前集群tiup是v4.0.0-rc，扩容创建CDC会拉取最新版本CDC？还是日志显示错误？
cdc_stderr_8300.log (28.5 KB) cdc_stderr-8301.log (34.0 KB) cdc-8300.log (9.0 KB) cdc-8301.log (18.7 KB) tiup-cluster-debug-2020-05-25-07-37-50.log (86 KB) scale-cdc.yaml (408 字节)

yilong · 2020 年5 月 25 日 01:38

如果默认配置，使用的都是相同的port端口和status ports，所以其他两个无法启动
请配置多个服务器
或者尝试参考以下 tikv 配置，配置不同端口给不同的 cdc 尝试是否可行，多谢

Hacker_SeEPBc31 · 2020 年5 月 25 日 02:02

目前在官方的文档里关于CDC的配置：只有一个端口配置，目前在单机多实例上配置的就是不同单独端口，status_port这个CDC组件的配置是否由文档参考？

lichunzhu-PingCAP · 2020 年5 月 25 日 03:10

需要通过 tiup update cluster 指令将 tiup cluster 升级到 0.6.2 版本才可以正常运行。

这个问题是这样，cdc 的启动参数有做过一些调整（为了支持 advertise-addr），所以 tiup cluster 也在 0.6.1 版本开始使用了新的 cdc 参数

Hacker_SeEPBc31 · 2020 年5 月 25 日 12:43

【v4.0.0-rc】版本升级tiup至0.6.0版本已解决以上问题：Error: unknown flag: --status-addr。非常感谢；

2.【v4.0.0-rc1】-另一套环境版本不同，扩容CDC还是报错，具体如下

【tidb版本】：v4.0.0-rc.1
【tiup版本】：升级为最新的0.6.0版本
【问题描述】：部署CDC节点启动失败！
Starting component cdc
Starting instance cdc 192.168.150.173:8300
Starting instance cdc 192.168.150.172:8300
Starting instance cdc 192.168.150.171:8300
retry error: operation timed out after 1m0s
cdc 192.168.150.173:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance
retry error: operation timed out after 1m0s
cdc 192.168.150.171:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance
retry error: operation timed out after 1m0s
cdc 192.168.150.172:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance
Verbose debug logs has been written to /root/logs/tiup-cluster-debug-2020-05-25-20-21-24.log

yanshoutidb1523×298 46 KB

【cdc日志】：
Error: unknown flag: --addr

cdc_stderr.log (28.6 KB)
tiup-cluster-debug-2020-05-25-20-21-24.log (88 KB) scale-cdc.yaml (363 字节)

lichunzhu-PingCAP · 2020 年5 月 25 日 13:08

v4.0.0-rc.1 用的是旧版本的 cdc 配置，建议通过 tiup cluster patch 的方式升级 cdc

建议：

首先通过 wget https://tiup-mirrors.pingcap.com/cdc-v4.0.0-rc-linux-amd64.tar.gz 下载 cdc 包
通过 tiup cluster patch zj_tidb cdc-v4.0.0-rc-linux-amd64.tar.gz -R cdc 的方式升级 cdc

NOTE： 这里并不意味这 v4.0.0-rc 的版本比 v4.0.0-rc.1 更加新。官方使用指南中 v4.0.0-rc 并不推荐使用 cdc，cdc 是在 v4.0.0-rc.1 后实现支持的。https://tiup-mirrors.pingcap.com/cdc-v4.0.0-rc-linux-amd64.tar.gz 实际存放的是一个版本较 rc.1 更新版本的 cdc，目的是用来测试 tiup cluster 是否支持 cdc 的新参数，官方并没有发布过 cdc v4.0.0-rc。建议将整个集群版本升级至 v4.0.0-rc.2。

附录-版本对应关系

tiup cluster v0.6.0 对应 tidb 集群 v4.0.0-rc.1

tiup cluster v0.6.1 及更新版本对应 tidb 集群 v4.0.0-rc.2 及以后版

Hacker_SeEPBc31 · 2020 年5 月 26 日 12:32

我们新项目上的比较急，两套测试环境：v4.0.0-rc和v4.0.0-rc1,生产v4.0.0-rc1，由于马上要使用CDC功能在测试环境上不同版本，升级CDC后已经正常启动。非常感谢。 ps:tiup cluster目前全部升级到0.6.2版本。

来了老弟 · 2020 年5 月 26 日 12:56

system · 2022 年10 月 31 日 19:12

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。