TICDC 启动超时

现状: TiDB Version: v4.0.0-rc 单机部署多实例:1 pd 3 kv 1tidb

正常可以启动和使用。

2.需求需要按爪给你CDC组件 当在现有集群通过tiup工具的scale-out命令扩容cdc 配置文件如下:

cdc_servers:

  • host: 192.168.180.231 port: 8300

  • host: 192.168.180.231 port: 8301

  • host: 192.168.180.231 port: 8302 3.执行扩容命令后:

  • Download cdc:v4.0.0-rc … Done

  • [ Serial ] - UserSSH: user=tidb, host=192.168.180.231
  • [ Serial ] - UserSSH: user=tidb, host=192.168.180.231
  • [ Serial ] - Mkdir: host=192.168.180.231, directories=’/home/tidb/deploy/cdc-8300’,’’,’/home/tidb/deploy/cdc-8300/log’,’/home/tidb/deploy/cdc-8300/bin’,’/home/tidb/deploy/cdc-8300/conf’,’/home/tidb/deploy/cdc-8300/scripts’
  • [ Serial ] - Mkdir: host=192.168.180.231, directories=’/home/tidb/deploy/cdc-8302’,’’,’/home/tidb/deploy/cdc-8302/log’,’/home/tidb/deploy/cdc-8302/bin’,’/home/tidb/deploy/cdc-8302/conf’,’/home/tidb/deploy/cdc-8302/scripts’
  • [ Serial ] - UserSSH: user=tidb, host=192.168.180.231
  • [ Serial ] - Mkdir: host=192.168.180.231, directories=’/home/tidb/deploy/cdc-8301’,’’,’/home/tidb/deploy/cdc-8301/log’,’/home/tidb/deploy/cdc-8301/bin’,’/home/tidb/deploy/cdc-8301/conf’,’/home/tidb/deploy/cdc-8301/scripts’
  • [ Serial ] - CopyComponent: component=cdc, version=v4.0.0-rc, remote=192.168.180.231:/home/tidb/deploy/cdc-8301
  • [ Serial ] - CopyComponent: component=cdc, version=v4.0.0-rc, remote=192.168.180.231:/home/tidb/deploy/cdc-8302
  • [ Serial ] - CopyComponent: component=cdc, version=v4.0.0-rc, remote=192.168.180.231:/home/tidb/deploy/cdc-8300
  • [ Serial ] - ScaleConfig: cluster=tidb-test, user=tidb, host=192.168.180.231, service=cdc-8301.service, deploy_dir=/home/tidb/deploy/cdc-8301, data_dir=, log_dir=/home/tidb/deploy/cdc-8301/log, cache_dir=
  • [ Serial ] - ScaleConfig: cluster=tidb-test, user=tidb, host=192.168.180.231, service=cdc-8302.service, deploy_dir=/home/tidb/deploy/cdc-8302, data_dir=, log_dir=/home/tidb/deploy/cdc-8302/log, cache_dir=
  • [ Serial ] - ScaleConfig: cluster=tidb-test, user=tidb, host=192.168.180.231, service=cdc-8300.service, deploy_dir=/home/tidb/deploy/cdc-8300, data_dir=, log_dir=/home/tidb/deploy/cdc-8300/log, cache_dir=
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0} Starting component pd Starting instance pd 192.168.180.231:2379 Start pd 192.168.180.231:2379 success Starting component node_exporter Starting instance 192.168.180.231 Start 192.168.180.231 success Starting component blackbox_exporter Starting instance 192.168.180.231 Start 192.168.180.231 success Starting component tikv Starting instance tikv 192.168.180.231:20162 Starting instance tikv 192.168.180.231:20161 Starting instance tikv 192.168.180.231:20160 Start tikv 192.168.180.231:20162 success Start tikv 192.168.180.231:20160 success Start tikv 192.168.180.231:20161 success Starting component tidb Starting instance tidb 192.168.180.231:4000 Start tidb 192.168.180.231:4000 success Starting component tiflash Starting instance tiflash 192.168.180.231:9000 Start tiflash 192.168.180.231:9000 success Starting component prometheus Starting instance prometheus 192.168.180.231:9090 Start prometheus 192.168.180.231:9090 success Starting component grafana Starting instance grafana 192.168.180.231:3000 Start grafana 192.168.180.231:3000 success Checking service state of pd 192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:55 CST; 52min ago Checking service state of tikv 192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:56 CST; 52min ago 192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:56 CST; 52min ago 192.168.180.231 Active: active (running) since Thu 2020-05-21 17:52:56 CST; 52min ago Checking service state of tidb 192.168.180.231 Active: active (running) since Thu 2020-05-21 17:53:01 CST; 52min ago Checking service state of tiflash 192.168.180.231 Active: active (running) since Thu 2020-05-21 17:53:14 CST; 52min ago Checking service state of prometheus 192.168.180.231 Active: active (running) since Thu 2020-05-21 18:45:36 CST; 2s ago Checking service state of grafana 192.168.180.231 Active: active (running) since Thu 2020-05-21 18:45:37 CST; 2s ago
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [Parallel] - UserSSH: user=tidb, host=192.168.180.231
  • [ Serial ] - save meta
  • [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0} Starting component cdc Starting instance cdc 192.168.180.231:8302 Starting instance cdc 192.168.180.231:8300 Starting instance cdc 192.168.180.231:8301 cdc 192.168.180.231:8301 failed to start: timed out waiting for port 8301 to be started after 1m0s, please check the log of the instance cdc 192.168.180.231:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance cdc 192.168.180.231:8302 failed to start: timed out waiting for port 8302 to be started after 1m0s, please check the log of the instance

Error: failed to start: failed to start cdc: cdc 192.168.180.231:8301 failed to start: timed out waiting for port 8301 to be started after 1m0s, please check the log of the instance: timed out waiting for port 8301 to be started after 1m0s

Verbose debug logs has been written to /root/logs/tiup-cluster-debug-2020-05-21-18-46-40.log. Error: run /root/.tiup/components/cluster/v0.6.0/cluster (wd:/root/.tiup/data/RzcOwx2) failed: exit status 1

请反馈日志 /root/logs/tiup-cluster-debug-2020-05-21-18-46-40.log ,多谢

tiup-cluster-debug-2020-05-21-18-46-40.log (91 KB) 日志见附件

麻烦帮忙采集下 cdc 的日志,多谢。

cdc_stderr.log (8.2 KB) cdc日志 Error: unknown flag: --status-addr Usage: cdc server [flags]

安装ticdcd参考: [使用TiUP在必要的TiDB上上添TiCDC组件] 章节,没有涉及到status-addr

1.启动集群时有手工指定 --status-addr吗? 如果有,重新启动,尝试不要指定参数

  1. 是否有 cdc完整日志,多谢。

1.启动集群 :使用 tiup cluster deploy部署完后,启动只涉及到 tiup cluster start tidb-test命令和扩容tiup cluster scale-out tidb-test /apps/tidb/scale-cdc.yaml命令,没有手工指定手工指定 --status-addr 2.cdc完整日志?目前cdc的log目录只有 [cdc_stderr.log],已经全部上传?还需要哪写日志,比如相对的路径下日志? 看了下CDC的启动脚本:run_cdc.sh里面是有status-addr这个参数的,是不是 tiup的启动命令,调用run_cdc.sh造成的? run_cdc.sh (402 字节)

run_cdc.sh:

#!/bin/bash set -e

WARNING: This file was auto-generated. Do not edit!

All your edit might be overwritten!

DEPLOY_DIR=/apps/tidb-deploy/cdc-8300 cd “${DEPLOY_DIR}” || exit 1 exec bin/cdc server
–status-addr “192.168.180.231:8300”
–pd “http://192.168.180.231:2379
–log-file “/apps/tidb-deploy/cdc-8300/log/cdc.log” 2>> “/apps/tidb-deploy/cdc-8300/log/cdc_stderr.log”

  1. 请尝试备份脚本
  2. 删除–status-addr ,手工启动脚本,看看能否成功,多谢。
 [cdc_stderr_8300.log|attachment](upload://8LmHoquApdtL4mjeoqrxdk9U6x4.log) (28.5 KB) [cdc_stderr-8301.log|attachment](upload://jntaXZ4j0Q5yoVo4ldijxQLXUrt.log) (34.0 KB) [cdc-8300.log|attachment](upload://qs6G13kIwXJ0pV10xr2e52MV6p1.log) (9.0 KB) [cdc-8301.log|attachment](upload://3lwOmpX3vVVzxYd0PbPKgptKTIR.log) (18.7 KB) [scale-cdc.yaml|attachment](upload://yShL7G0BrmgWMo4gZUuLOXd4CPW.yaml) (408 字节) [tiup-cluster-debug-2020-05-25-07-37-50.log|attachment](upload://l9eVV6bZkza0kFHdK8omvrdApPi.log) (86 KB) 目前针对单机多实例CDC的部署强制修改run_cdc.sh(删除--status-addr参数),启动后显示启动了一个节点8300(8301和8302失败),

但是从日志看三个节点日志均有同样的错误信息, 日志见附件:主要报错[“run server”] [error=“listen tcp 0.0.0.0:8300: bind: address already in use”] 1.是不是配置有误,扩容的配置文件见附件scale-cdc.yaml; 2.错误信息当前CDC的版本是[“Welcome to Change Data Capture (CDC)”] [release-version=v4.0.0-rc.2], 但是当前集群tiup是v4.0.0-rc,扩容创建CDC会拉取最新版本CDC?还是日志显示错误?

目前针对单机多实例CDC的部署强制修改run_cdc.sh(删除–status-addr参数),启动后显示启动了一个节点8300(8301和8302失败), 但是从日志看三个节点日志均有同样的错误信息, 日志见附件:主要报错[“run server”] [error=“listen tcp 0.0.0.0:8300: bind: address already in use”] 1.是不是配置有误,扩容的配置文件见附件scale-cdc.yaml; 2.错误信息当前CDC的版本是[“Welcome to Change Data Capture (CDC)”] [release-version=v4.0.0-rc.2], 但是当前集群tiup是v4.0.0-rc,扩容创建CDC会拉取最新版本CDC?还是日志显示错误? cdc_stderr_8300.log (28.5 KB) cdc_stderr-8301.log (34.0 KB) cdc-8300.log (9.0 KB) cdc-8301.log (18.7 KB) tiup-cluster-debug-2020-05-25-07-37-50.log (86 KB) scale-cdc.yaml (408 字节)

  1. 如果默认配置,使用的都是相同的port端口和status ports,所以其他两个无法启动
  2. 请配置多个服务器
  3. 或者尝试参考以下 tikv 配置,配置不同端口给不同的 cdc 尝试是否可行,多谢

image

目前在官方的文档里关于CDC的配置:只有一个端口配置,目前在单机多实例上配置的就是不同单独端口,status_port这个CDC组件的配置是否由文档参考?

需要通过 tiup update cluster 指令将 tiup cluster 升级到 0.6.2 版本才可以正常运行。

这个问题是这样,cdc 的启动参数有做过一些调整(为了支持 advertise-addr),所以 tiup cluster 也在 0.6.1 版本开始使用了新的 cdc 参数

  1. 【v4.0.0-rc】版本 升级tiup至0.6.0版本已解决以上问题:Error: unknown flag: --status-addr。非常感谢;

2.【v4.0.0-rc1】-另一套环境版本不同, 扩容CDC还是报错,具体如下

  1. 【tidb版本】:v4.0.0-rc.1

  2. 【tiup版本】:升级为最新的0.6.0版本

  3. 【问题描述】:部署CDC节点启动失败! Starting component cdc Starting instance cdc 192.168.150.173:8300 Starting instance cdc 192.168.150.172:8300 Starting instance cdc 192.168.150.171:8300 retry error: operation timed out after 1m0s cdc 192.168.150.173:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance retry error: operation timed out after 1m0s cdc 192.168.150.171:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance retry error: operation timed out after 1m0s cdc 192.168.150.172:8300 failed to start: timed out waiting for port 8300 to be started after 1m0s, please check the log of the instance Verbose debug logs has been written to /root/logs/tiup-cluster-debug-2020-05-25-20-21-24.log

    【cdc日志】: Error: unknown flag: --addr

    cdc_stderr.log (28.6 KB) tiup-cluster-debug-2020-05-25-20-21-24.log (88 KB) scale-cdc.yaml (363 字节)

v4.0.0-rc.1 用的是旧版本的 cdc 配置,建议通过 tiup cluster patch 的方式升级 cdc

建议:

  1. 首先通过 wget https://tiup-mirrors.pingcap.com/cdc-v4.0.0-rc-linux-amd64.tar.gz 下载 cdc 包
  2. 通过 tiup cluster patch zj_tidb cdc-v4.0.0-rc-linux-amd64.tar.gz -R cdc 的方式升级 cdc

NOTE: 这里并不意味这 v4.0.0-rc 的版本比 v4.0.0-rc.1 更加新。官方使用指南中 v4.0.0-rc 并不推荐使用 cdc,cdc 是在 v4.0.0-rc.1 后实现支持的。https://tiup-mirrors.pingcap.com/cdc-v4.0.0-rc-linux-amd64.tar.gz 实际存放的是一个版本较 rc.1 更新版本的 cdc,目的是用来测试 tiup cluster 是否支持 cdc 的新参数,官方并没有发布过 cdc v4.0.0-rc。 建议将整个集群版本升级至 v4.0.0-rc.2。

附录-版本对应关系

tiup cluster v0.6.0 对应 tidb 集群 v4.0.0-rc.1

tiup cluster v0.6.1 及更新版本 对应 tidb 集群 v4.0.0-rc.2 及以后版

我们新项目上的比较急,两套测试环境:v4.0.0-rc和v4.0.0-rc1,生产v4.0.0-rc1,由于马上要使用CDC功能在测试环境上不同版本,升级CDC后已经正常启动。非常感谢。 ps:tiup cluster目前全部升级到0.6.2版本。

:ok_hand: