【TiDB 版本】 --源
V5.2.1
【TiCDC 版本】
V5.2.2
【TiDB 版本】 --目标
V5.2.2
【故障说明】
- 一共有3个changefeed,1个同步到mysql,2个同步到TIDB;同步时间和tso都卡在2021-11-11 11:36:28.025、429025339120025644,延迟有2小时。
- 源端最近一次DDL在11:05分,我看目标库早已经同步过来。
- 源端查过没有正在执行的大事务,也没有锁冲突;目标库无业务运行;
- TiCDC无错误日志,运行日志有大量WARN信息:GetSnapshot is taking too long, DDL puller stuck?
【故障排查材料】
1.TiCDC运行日志
[2021/11/11 13:56:59.913 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025521952358428]
[2021/11/11 13:56:59.914 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025556830617610]
[2021/11/11 13:56:59.916 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025341059891213]
[2021/11/11 13:56:59.919 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025339670790159]
[2021/11/11 13:56:59.920 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025344074547264]
[2021/11/11 13:56:59.920 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025560605491251]
[2021/11/11 13:56:59.933 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025412638310434]
[2021/11/11 13:56:59.935 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025359305375750]
[2021/11/11 13:56:59.937 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025396175929366]
[2021/11/11 13:56:59.938 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025343917522948]
[2021/11/11 13:56:59.938 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025386030956550]
[2021/11/11 13:56:59.939 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025439888179820]
[2021/11/11 13:56:59.940 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025449888972871]
[2021/11/11 13:56:59.941 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025398980870154]
[2021/11/11 13:56:59.941 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025350798540831]
[2021/11/11 13:56:59.944 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025396175929366]
[2021/11/11 13:56:59.946 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025342423302158]
[2021/11/11 13:56:59.946 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025521952358428]
[2021/11/11 13:56:59.949 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025522437586972]
[2021/11/11 13:56:59.952 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025443322527752]
[2021/11/11 13:56:59.955 +08:00] [WARN] [schema_storage.go:726] [“GetSnapshot is taking too long, DDL puller stuck?”] [ts=429025407329894432]
大概过滤下:
$ grep WARN cdc.log |awk -F ‘[’ ‘{print $5}’ |awk -F’]’ ‘{print $1}’ |sort |uniq -c
828454 “GetSnapshot is taking too long, DDL puller stuck?”
285284 “region not receiving resolved event from tikv or resolved ts is not pushing for too long time, try to resolve lock”
1 “send request to stream failed”
6598 “The resolvedTs is fallen back in kvclient”
这是一天的日志信息:
cdc.log1.tar.gz (14.6 MB)
cdc-2021-11-11T13-16-42.043.rar (10.6 MB)
cdc-2021-11-11T13-37-34.453.rar (10.2 MB)
cdc-2021-11-11T14-04-29.518.rar (10.0 MB)
cdc-2021-11-11T12-28-03.407.rar (11.4 MB)
cdc-2021-11-11T12-55-02.717.rar (10.9 MB)
TiCDC监控快照
tidb-ts-prd-TiCDC_2021-11-11T06_11_42.715Z.json (151.6 KB)
3.changgefeed截图
4.capture截图
- 源库监控快照
tidb-ts-prd-TiDB_2021-11-11T06_18_55.213Z.json (9.0 MB)
6.目标监控快照
这个没业务跑,就不提交了