tidb 5.2.1 ticdc 同步到kafka创建后 无报错 tso 和checkpoint 不推进

【 TiDB 使用环境】生产环境
【 TiDB 版本】v5.2.1
【复现路径】
– 创建全库cdc 同步
tiup ctl:v5.2.1 cdc changefeed create --pd=http://10.30.30.4:2379 --sink-uri=“kafka://node3.gcl-kafka.com:9092/ticdc_jnpf_tenant_859852?kafka-version=2.13.3&partition-num=3&max-message-bytes=128108864&replication-factor=1&protocol=canal-json” --changefeed-id=“ticdc-jnpf-tenant-859852” --config=/root/ops/cdc-config-jnpf_tenant_859852.conf

– 查看所有的任务列表
tiup ctl:v5.2.1 cdc changefeed list --pd=http://10.30.30.4:2379
– 删除同步任务
tiup ctl:v5.2.1 cdc changefeed remove --pd=http://10.30.30.4:2379 --changefeed-id ticdc-jnpf-tenant-859852
– 删除topic
/usr/local/kafka/bin/kafka-topics.sh --delete --bootstrap-server 10.30.254.2:9092 --topic ticdc_jnpf_tenant_859852

删除同步任务 删除topic 重新创建还是一样的错误
【遇到的问题:问题现象及影响】
【资源配置】
【附件:截图/日志/监控】
– cdc 日志无任何报错 就输出到到add a table 后面就不动了


– 多次查询 任务 无报错 tso 和checkpoint 一直不动

– 有些提示信息了
requestID=220869] [span=“[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”] [checkpoint=437455110267731969] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1217751 > “]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_cache.go:1070] [“switch region peer to next due to NotLeader with NULL leader”] [currIdx=0] [regionID=1217751]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.290 +08:00] [INFO] [client.go:876] [“cannot get rpcCtx, retry span”] [regionID=1217751] [span=”[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”]
[2022/11/18 16:06:53.290 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.291 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.291 +08:00] [INFO] [client.go:774] [“start new request”] [request=“{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220870,"Request":null}”] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.291 +08:00] [WARN] [client.go:781] [“send request to stream failed”] [addr=10.30.30.11:20160] [storeID=1] [regionID=1217751] [requestID=220870] [error=EOF]
[2022/11/18 16:06:53.326 +08:00] [INFO] [client_v2.go:163] [“stream to store closed”] [addr=10.30.30.11:20160] [storeID=1]
[2022/11/18 16:06:53.336 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.336 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.336 +08:00] [INFO] [client.go:721] [“creating new stream to store to send request”] [regionID=1217751] [requestID=220871] [storeID=1] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.336 +08:00] [INFO] [client.go:774] [“start new request”] [request=“{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220871,"Request":null}”] [addr=10.30.30.11:20160]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_worker.go:249] [“single region event feed disconnected”] [regionID=1217751] [requestID=220871] [span=“[6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb, 6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb)”] [checkpoint=437455110267731969] [error=“[CDC:ErrEventFeedEventError]not_leader:<region_id:1217751 leader:<id:1217754 store_id:5 > > “]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_range_lock.go:370] [“unlocked range”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_cache.go:1083] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=1217751] [currIdx=0] [leaderStoreID=5]
[2022/11/18 16:06:53.337 +08:00] [INFO] [region_range_lock.go:218] [“range locked”] [lockID=67727] [regionID=1217751] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=437455110267731969]
[2022/11/18 16:06:53.337 +08:00] [INFO] [client.go:721] [“creating new stream to store to send request”] [regionID=1217751] [requestID=220872] [storeID=5] [addr=10.30.30.10:20160]
[2022/11/18 16:06:53.338 +08:00] [INFO] [client.go:774] [“start new request”] [request=”{"header":{"cluster_id":7018101457625346894,"ticdc_version":"5.2.1"},"region_id":1217751,"region_epoch":{"conf_ver":5,"version":1901},"checkpoint_ts":437455110267731969,"start_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbAAAAAD7","end_key":"bURETEpvYkH/ZP9kSWR4TGn/c3T/AAAAAAD/AAAA9wAAAAD/AAAAbQAAAAD7","request_id":220872,"Request":null}”] [addr=10.30.30.10:20160]
[2022/11/18 16:06:53.373 +08:00] [INFO] [client_v2.go:163] [“stream to store closed”] [addr=10.30.30.11:20160] [storeID=1]

changefeed配置文件内容发一下

# 指定配置文件中涉及的库名、表名是否为大小写敏感
# 该配置会同时影响 filter 和 sink 相关配置,默认为 true
case-sensitive = true

# 是否输出 old value,从 v4.0.5 开始支持,从 v5.0 开始默认为 true
enable-old-value = true

[filter]
# 忽略指定 start_ts 的事务
#ignore-txn-start-ts = [1, 2]

# 过滤器规则
# 过滤规则语法:https://docs.pingcap.com/zh/tidb/stable/table-filter#表库过滤语法
rules = ['jnpf_tenant_859852.*']

[mounter]
# mounter 线程数,用于解码 TiKV 输出的数据
worker-num = 4

[sink]
# 对于 MQ 类的 Sink,可以通过 dispatchers 配置 event 分发器
# 支持 default、ts、rowid、table 四种分发器,分发规则如下:
# - default:有多个唯一索引(包括主键)时按照 table 模式分发;只有一个唯一索引(或主键)按照 rowid 模式分发;如果开启了 old value 特性,按照 table 分发
# - ts:以行变更的 commitTs 做 Hash 计算并进行 event 分发
# - rowid:以表的主键或者唯一索引列名和列值做 Hash 计算并进行 event 分发
# - table:以表的 schema 名和 table 名做 Hash 计算并进行 event 分发
# matcher 的匹配语法和过滤器规则语法相同
dispatchers = []
#    {matcher = ['test1.*', 'test2.*'], dispatcher = "ts"},
#    {matcher = ['test3.*', 'test4.*'], dispatcher = "rowid"},
#]
# 对于 MQ 类的 Sink,可以指定消息的协议格式
# 目前支持 default、canal、avro 和 maxwell 四种协议。default 为 TiCDC Open Protocol
protocol = "canal"

[cyclic-replication]
# 是否开启环形同步
enable = false
# 当前 TiCDC 的复制 ID
#replica-id = 1
# 需要过滤掉的同步 ID
#filter-replica-ids = [2,3]
# 是否同步 DDL
#sync-ddl = true

你在那个库里面insert一条测试数据,观察ticdc的日志试试,看有没有异常信息

请问下这个你需要同步的库里面有多少张表? 5.2.1 版本 TiCDC 同步的表总数量建议不要超过 1.5k 。

看看regionID=1217751的这个region是否有异常?

813张表 还没到1.5K

重新删除 创建 就更换另一个regionID了 从卡住add a table 开始 到报错 时间应该接近tidb_gc_life_time 时间目前设置的30分钟

数据库一直都有进数据的 没有看到有其他异常信息

目前重启了cdc 两个节点 把tidb_gc_life_time 从30m0s 修改成10m0s 可以同步了 修改回30m0s 还是可以同步 推进checkpoint 放大招解决问题

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。