CDC任务异常中断?

【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】

背景是我的一个TIDB集群v6.1.0,系统是CentOS 7,PD和TIDB-SERVER是复用的配置在三台的虚拟机(共享虚拟化宿主机非ssd磁盘),TIKV-SERVER是配置在三台物理机(ssd磁盘)上面,TICDC集群也是复用在TIKV三台机器上的

上游是TIDB-6.1.0,下游是MYSQL-Percona Server5.7

TICDC某个节点的日志如下:

[2023/07/18 09:32:10.889 +08:00] [ERROR] [processor.go:546] ["error on running processor"] [capture=10.116.172.206:8300] [changefeed=simple-replication-task] [error="[CDC:ErrFlowControllerEventLargerThanQuota]event is larger than the total memory quota, size: 12602807,
quota: 10485760"] [errorVerbose="[CDC:ErrFlowControllerEventLargerThanQuota]event is larger than the total memory quota, size: 12602807, quota: 10485760\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\
ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:164\ngithub.com/pingcap/tiflow/cdc/sink/flowcontrol.(*tableMemoryQuota).consumeWithBlocking\n\tgithub.com/pingcap/tiflow/cdc/sink/flowc
ontrol/table_memory_quota.go:59\ngithub.com/pingcap/tiflow/cdc/sink/flowcontrol.(*TableFlowController).Consume\n\tgithub.com/pingcap/tiflow/cdc/sink/flowcontrol/flow_control.go:133\ngithub.com/pingcap/tiflow/cdc/processor/pipeline.(*sorterNode).start.func3\n\tgithub.com
/pingcap/tiflow/cdc/processor/pipeline/sorter.go:250\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\truntime/asm_amd64.s:1571"]

[2023/07/18 09:32:10.890 +08:00] [ERROR] [processor.go:355] ["run processor failed"] [changefeed=simple-replication-task] [capture=10.116.172.206:8300] [error="[CDC:ErrFlowControllerEventLargerThanQuota]event is larger than the total memory quota, size: 12602807, quota:
 10485760"] [errorVerbose="[CDC:ErrFlowControllerEventLargerThanQuota]event is larger than the total memory quota, size: 12602807, quota: 10485760\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithu
b.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:164\ngithub.com/pingcap/tiflow/cdc/sink/flowcontrol.(*tableMemoryQuota).consumeWithBlocking\n\tgithub.com/pingcap/tiflow/cdc/sink/flowcontrol
/table_memory_quota.go:59\ngithub.com/pingcap/tiflow/cdc/sink/flowcontrol.(*TableFlowController).Consume\n\tgithub.com/pingcap/tiflow/cdc/sink/flowcontrol/flow_control.go:133\ngithub.com/pingcap/tiflow/cdc/processor/pipeline.(*sorterNode).start.func3\n\tgithub.com/pingc
ap/tiflow/cdc/processor/pipeline/sorter.go:250\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\truntime/asm_amd64.s:1571"]

使用TIUP工具看报错如下:

#   tiup  cdc cli changefeed  list --pd x.x.x.x:2379

Starting component `cdc`: /home/tidb/.tiup/components/cdc/v6.1.1/cdc cli changefeed list --pd x.x.x.x:2379
[
  {
    "id": "simple-replication-task",
    "summary": {
      "state": "failed",
      "tso": 442929853387243521,
      "checkpoint": "2023-07-18 09:21:40.580",
      "error": {
        "addr": "x.x.x.x:8300",
        "code": "CDC:ErrFlowControllerEventLargerThanQuota",
        "message": "[CDC:ErrFlowControllerEventLargerThanQuota]event is larger than the total memory quota, size: 12602807, quota: 10485760"
      }
    }
  }
]

我该如何解决这个问题呢?看报错是内存占用大于配额了嘛?是TICDC的内存占用吗?

【资源配置】
【附件:截图/日志/监控】

加大cdc机器的内存 内存不够了

有人遇到一样问题,参考下
cdc同步时报错ErrFlowControllerEventLargerThanQuota - :ringer_planet: TiDB 技术问题 / 部署&运维管理 - TiDB 的问答社区 (asktug.com)

你也没说你各个机器节点多少内存 报错是内存不够

per-table-memory-quota = 10485760 默认值,cdc-server的参数,改一下
cdc 的yaml配置里面可以改

tiup cluster edit-config cluster-name
image
这个参数改大一点

请问per-table-memory-quota这个值大家都改多大,我正要迁移1T的数据,这个值在迁移中能中断重配吗?提前怎么预估?

机器是128GB的内存,TIKV-SERVER和CDC-CLUSTER都是复用的,没有针对这2个组件单独做内存、CPU资源隔离。。另外CDC这么吃内存的嘛?

我这个任务的状态目前已经是 failed 了,我修改完这个参数后是不是需要删除任务、新建任务重新开始同步啊?

删除-加大-重建

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。