TiCDC卡住的问题

【 TiDB 使用环境】生产环境
【 TiDB 版本】v7.5.3
发现一个问题,目前有超过100张表加入了CDC,当有一些表写QPS比较高的情况下,会影响到其他任务,表现为任务卡住,而且是一值卡,重启任务后,恢复正常,日志如下:

[2024/08/30 18:33:59.068 +08:00] [INFO] [shared_client.go:709] ["event feed starts to check locked regions"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd]
[2024/08/30 18:34:29.065 +08:00] [INFO] [shared_client.go:709] ["event feed starts to check locked regions"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd_processor_ddl_puller]
[2024/08/30 18:34:29.068 +08:00] [INFO] [shared_client.go:709] ["event feed starts to check locked regions"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd]
[2024/08/30 18:34:36.768 +08:00] [INFO] [shared_stream.go:483] ["event feed receives a region error"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [streamID=3465] [subscriptionID=2062] [regionID=193897] [stateIsNil=false] [error="not_leader:<region_id:193897 leader:<id:754119 store_id:625882 > > "]
[2024/08/30 18:34:36.768 +08:00] [INFO] [shared_region_worker.go:109] ["region worker get a region error"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [streamID=3465] [subscriptionID=193897] [regionID=193897] [reschedule=true] [error="not_leader:<region_id:193897 leader:<id:754119 store_id:625882 > > "]
[2024/08/30 18:34:36.768 +08:00] [INFO] [shared_client.go:581] ["cdc region error"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [error="not_leader:<region_id:193897 leader:<id:754119 store_id:625882 > > "]
[2024/08/30 18:34:36.768 +08:00] [INFO] [shared_client.go:413] ["event feed will request a region"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [streamID=3479] [subscriptionID=2062] [regionID=193897] [storeID=625882] [addr=10.9.113.86:20160]
[2024/08/30 18:34:41.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":203545,\"ResolvedTs\":452202091239704392,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.226996277+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:43.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":144561,\"ResolvedTs\":452202091763993433,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.226844261+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:45.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":233937,\"ResolvedTs\":452202092275174770,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.226847551+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:47.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":214125,\"ResolvedTs\":452202092799462205,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.226789682+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:49.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":192973,\"ResolvedTs\":452202093323747791,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.226927779+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:51.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":164977,\"ResolvedTs\":452202093861144561,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.227139324+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:53.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":253205,\"ResolvedTs\":452202094398537840,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.319740571+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:55.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":228977,\"ResolvedTs\":452202094909721604,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.227059906+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:57.068 +08:00] [WARN] [shared_client.go:787] ["event feed finds slow locked ranges"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd] [subscriptionID=2062] [ranges="{\"LockedRegionCount\":8185,\"Holes\":null,\"FastestRegion\":{\"RegionID\":140109,\"ResolvedTs\":452202095434008620,\"Initialized\":true,\"Created\":\"2024-08-27T19:38:29.22687232+08:00\"},\"SlowestRegion\":{\"RegionID\":193897,\"ResolvedTs\":452202089889663383,\"Initialized\":false,\"Created\":\"2024-08-30T18:34:36.768708123+08:00\"}}"]
[2024/08/30 18:34:59.066 +08:00] [INFO] [shared_client.go:709] ["event feed starts to check locked regions"] [namespace=default] [changefeed=cdc-dw-patent-core-dwd_processor_ddl_puller]

注意这个报错:error="not_leader

你有几个pd 几个 tikv?有坏掉的吗?

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面

这个正常的,正常的任务也会有这个

我觉得只要是【INFO】和【WARN】的,不是【ERROR】的,就没啥大问题

可以看下Grafana高峰期TiCDC的负载是不是比较高,并且资源不够

查看一下tidb 有没有频繁 ddl 操作:tidb cdc 实测对于ddl 操作处理极慢。对应的的现象,我猜测一下,ddl 操作导致 tidb cdc 变慢,同时 ddl 操作也会导致 cpu 上升,变慢了,你停下来了,然后ddl 操作完成,你启动了ticdc ,速度变快了,循环往复

这个时候,其他任务的某些表的QPS挺高的,但是ticdc资源消耗并不高。一直是“event feed starts to check locked regions”

不重启任务,会一直卡住。cdc不走

可能是部分热点region导致cdc延迟,你可以看下能不能先把热点打散下