Tidb4.0.11 cdc占用内存高导致oom

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
V4.0.11
3PD+3TIKV+3CDC

【概述】 场景 + 问题概述
CDC节点今早10:30开始内存占用增多,导致3个cdc节点不断的OOM重启
【背景】 做过哪些操作
相关时间段无慢查询,无变更
【现象】 业务和数据库现象
cdc同步节点OOM
【问题】 当前遇到的问题

【业务影响】

【TiDB 版本】
4.0.11
【应用软件及版本】

【附件】 相关日志及配置信息

  • TiUP Cluster Display 信息
  • TiUP CLuster Edit config 信息

监控(https://metricstool.pingcap.com/)

  • TiDB-Overview Grafana监控
  • TiDB Grafana 监控
  • TiKV Grafana 监控
  • PD Grafana 监控
  • 对应模块日志(包含问题前后 1 小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

抓取一下 pprof

cdc有专门的的pprof?
这是tidb的pprof

看cdc的日志大都是warn
[2021/12/28 15:29:57.644 +08:00] [WARN] [client.go:1292] [“The resolvedTs is fallen back in kvclient”] [“Event Type”=RESOLVED] [resolvedTs=430093525501345805] [lastResolvedT
s=430093525514452996] [regionID=70697]
[2021/12/28 15:29:57.644 +08:00] [WARN] [client.go:1292] [“The resolvedTs is fallen back in kvclient”] [“Event Type”=RESOLVED] [resolvedTs=430093525501345805] [lastResolvedT
s=430093525514452996] [regionID=65665]
[2021/12/28 15:29:57.644 +08:00] [WARN] [client.go:1292] [“The resolvedTs is fallen back in kvclient”] [“Event Type”=RESOLVED] [resolvedTs=430093525501345805] [lastResolvedT
s=430093525514452996] [regionID=80109]
[2021/12/28 15:29:57.644 +08:00] [WARN] [client.go:1292] [“The resolvedTs is fallen back in kvclient”] [“Event Type”=RESOLVED] [resolvedTs=430093525501345805] [lastResolvedT
s=430093525514452996] [regionID=22877]
[2021/12/28 15:29:57.644 +08:00] [WARN] [client.go:1292] [“The resolvedTs is fallen back in kvclient”] [“Event Type”=RESOLVED] [resolvedTs=430093525501345805] [lastResolvedT
s=430093525514452996] [regionID=18297]
[2021/12/28 15:29:57.644 +08:00] [WARN] [client.go:1292] [“The resolvedTs is fallen back in kvclient”] [“Event Type”=RESOLVED] [resolvedTs=430093525501345805] [lastResolvedT
s=430093525514452996] [regionID=73689]

[2021/12/28 15:32:05.266 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.267 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.271 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.274 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.274 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.276 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.280 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.280 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.284 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]
[2021/12/28 15:32:05.285 +08:00] [WARN] [processor.go:1076] [“the local resolved ts is less than the global resolved ts”] [localResolvedTs=430093558295560198] [globalResolve
dTs=430093558465953795]

[2021/12/28 15:56:47.416 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=1m9.290243314s] [regionID=192629]
[2021/12/28 15:56:49.180 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=21.283991718s] [regionID=192581]
[2021/12/28 15:56:51.607 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=30.278850034s] [regionID=192745]
[2021/12/28 15:56:56.217 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=28.088012043s] [regionID=185129]
[2021/12/28 15:57:02.454 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=29.290469943s] [regionID=192665]
[2021/12/28 15:57:03.856 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=1m25.729652276s] [regionID=189885]
[2021/12/28 15:57:04.624 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=42.713197595s] [regionID=192673]
[2021/12/28 15:57:10.232 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=1m32.105338822s] [regionID=189785]
[2021/12/28 15:57:11.048 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=1m33.070449967s] [regionID=192349]
[2021/12/28 15:57:11.076 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=1m32.385617767s] [regionID=185177]
[2021/12/28 15:57:12.002 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=1m33.812908108s] [regionID=192713]
[2021/12/28 15:57:13.105 +08:00] [WARN] [client.go:1361] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=192297] [span="[74800000000000a9ff5e5f698000000000ff0000020380000000ff0000000103800000ff000000bf0e000000fc, 74800000000000a9ff5e5f698000000000ff0000020380000000ff0000000103800000ff0000011eac000000fc)"] [duration=1m24.649s] [resolvedTs=430093932335988737]
[2021/12/28 15:57:13.129 +08:00] [WARN] [client.go:1361] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=185129] [span="[74800000000000a9ff5e5f728000000000ff001ca90000000000fa, 74800000000000a9ff5e5f728000000000ff0060460000000000fa)"] [duration=1m24.649s] [resolvedTs=430093932335988737]
[2021/12/28 15:57:13.164 +08:00] [WARN] [client.go:1361] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=192665] [span="[74800000000000a9ff5e5f728000000000ff01786c0000000000fa, 74800000000000a9ff5e5f728000000000ff01a6880000000000fa)"] [duration=1m24.649s] [resolvedTs=430093932335988737]
[2021/12/28 15:57:13.274 +08:00] [WARN] [client.go:1361] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=192749] [span="[74800000000000a9ff5e5f698000000000ff000005014c573433ff33423130ff344d31ff3031303334ff3700ff000000000000f803ff800000000000bceaff0000000000000000f7, 74800000000000a9ff5e5f698000000000ff000005014c573433ff33423131ff304d31ff3033383034ff3200ff000000000000f803ff8000000000012820ff0000000000000000f7)"] [duration=1m24.649s] [resolvedTs=430093932335988737]
[2021/12/28 15:57:13.666 +08:00] [WARN] [client.go:1361] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=192313] [span="[74800000000000a9ff5e5f698000000000ff000005014c573433ff33423131ff304d31ff3033383034ff3200ff000000000000f803ff8000000000012820ff0000000000000000f7, 74800000000000a9ff5e5f698000000000ff000005014c573433ff33423131ff324d31ff3035373439ff3600ff000000000000f803ff800000000001740bff0000000000000000f7)"] [duration=1m24.649s] [resolvedTs=430093932335988737]
[2021/12/28 15:57:13.930 +08:00] [WARN] [client.go:1361] [“region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”] [regionID=189793] [span="[74800000000000a9ff5e5f728000000000ff0114080000000000fa, 74800000000000a9ff5e5f728000000000ff0155290000000000fa)"] [duration=1m24.649s] [resolvedTs=430093932335988737]
[2021/12/28 15:57:13.987 +08:00] [WARN] [client.go:1393] [“The time cost of initializing is too mush”] [timeCost=31.89262421s] [regionID=192289]

error日志:
[2021/12/28 15:43:55.421 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=192729] [requestID=2550] [span="[74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e403800000ff000000447f000000fc, 74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e503800000ff000000cebc000000fc)"] [checkpoint=430093745361518608] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:192729 > “]
[2021/12/28 15:43:55.423 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=192729] [requestID=2551] [span=”[74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e403800000ff000000447f000000fc, 74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e503800000ff000000cebc000000fc)"] [checkpoint=430093745361518608] [error="[CDC:ErrEventFeedEventError]region_not_found:<region_id:192729 > “]
[2021/12/28 15:48:35.991 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=192729] [requestID=1297] [span=”[74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e403800000ff000000447f000000fc, 74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e503800000ff000000cebc000000fc)"] [checkpoint=430093819168161795] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:192729 > “]
[2021/12/28 15:48:35.993 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=192729] [requestID=2441] [span=”[74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e403800000ff000000447f000000fc, 74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e503800000ff000000cebc000000fc)"] [checkpoint=430093819168161795] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:192729 > “]
[2021/12/28 15:48:35.997 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=192729] [requestID=2442] [span=”[74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e403800000ff000000447f000000fc, 74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e503800000ff000000cebc000000fc)"] [checkpoint=430093819168161795] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:192729 > “]
[2021/12/28 15:48:35.998 +08:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=192729] [requestID=2443] [span=”[74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e403800000ff000000447f000000fc, 74800000000000a9ff5e5f698000000000ff0000040380000000ff000007e503800000ff000000cebc000000fc)"] [checkpoint=430093819168161795] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:192729 leader:<id:192730 store_id:1 > > "]

http://:/debug/pprof/heap

我上面的图就是结果。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。