TiCDC一直报panic: unsigned value not in type uint64,进程一直重启

【 TiDB 使用环境】
生产环境
【 TiDB 版本】
v5.4.0
【遇到的问题】
从5.2.1升级到5.4.0版本,其中其中一个同步任务阻塞十几个小时,期间TiCDC一直重启,ERROR日志中有大量PANIC信息
【问题现象及影响】
cdc同步任务进度阻塞,其中一个任务一直停滞


最早出现问题的时间点有如下日志,panic之后cdc进程重启了,[PANIC] [canal.go:167] ["unsigned value not in type uint64"],之后每分钟都在重启

[2022/04/11 18:41:54.554 +08:00] [PANIC] [canal.go:167] ["unsigned value not in type uint64"] [column="{\"name\":\"sales_order_total\",\"type\":3,\"flag\":129,\"value\":\"0\"}"] [stack="github.com/pingcap/tiflow/cdc/sink/codec.getJavaSQLType\
\tgithub.com/pingcap/tiflow/cdc/sink/codec/canal.go:167\
github.com/pingcap/tiflow/cdc/sink/codec.(*canalEntryBuilder).buildColumn\
\tgithub.com/pingcap/tiflow/cdc/sink/codec/canal.go:280\
github.com/pingcap/tiflow/cdc/sink/codec.(*canalEntryBuilder).buildRowData\
\tgithub.com/pingcap/tiflow/cdc/sink/codec/canal.go:317\
github.com/pingcap/tiflow/cdc/sink/codec.(*CanalFlatEventBatchEncoder).newFlatMessageForDML\
\tgithub.com/pingcap/tiflow/cdc/sink/codec/canal_flat.go:178\
github.com/pingcap/tiflow/cdc/sink/codec.(*CanalFlatEventBatchEncoder).AppendRowChangedEvent\
\tgithub.com/pingcap/tiflow/cdc/sink/codec/canal_flat.go:321\
github.com/pingcap/tiflow/cdc/sink.(*mqSink).runWorker\
\tgithub.com/pingcap/tiflow/cdc/sink/mq.go:338\
github.com/pingcap/tiflow/cdc/sink.(*mqSink).run.func1\
\tgithub.com/pingcap/tiflow/cdc/sink/mq.go:267\
golang.org/x/sync/errgroup.(*Group).Go.func1\
\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57"]
[2022/04/11 18:42:11.193 +08:00] [INFO] [helper.go:52] ["init log"] [file=/data/tidb-deploy/cdc-8300/log/cdc.log] [level=info]
[2022/04/11 18:42:11.193 +08:00] [INFO] [version.go:47] ["Welcome to Change Data Capture (CDC)"] [release-version=v5.4.0] [git-hash=14db59be99371a0ef996708fa2867e73b2d7b8af] [git-branch=heads/refs/tags/v5.4.0] [utc-build-time="2022-01-25 07:14:04"] [go-version="go version go1.16.4 linux/amd64"] [failpoint-build=false]
[2022/04/11 18:42:11.193 +08:00] [INFO] [server.go:75] ["creating CDC server"] [pd-addrs="[http://10.49.1.30:2379,http://10.49.2.20:2379,http://10.49.1.187:2379]"] [config="{\"addr\":\"0.0.0.0:8300\",\"advertise-addr\":\"10.49.1.145:8300\",\"log-file\":\"/data/tidb-deploy/cdc-8300/log/cdc.log\",\"log-level\":\"info\",\"log\":{\"file\":{\"max-size\":300,\"max-days\":0,\"max-backups\":0},\"error-output\":\"stderr\"},\"data-dir\":\"/data/tidb-data/cdc-8300\",\"gc-ttl\":86400,\"tz\":\"System\",\"capture-session-ttl\":10,\"owner-flush-interval\":200000000,\"processor-flush-interval\":100000000,\"sorter\":{\"num-concurrent-worker\":4,\"chunk-size-limit\":134217728,\"max-memory-percentage\":30,\"max-memory-consumption\":17179869184,\"num-workerpool-goroutine\":16,\"sort-dir\":\"/tmp/sorter\"},\"security\":{\"ca-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"cert-allowed-cn\":null},\"per-table-memory-quota\":10485760,\"kv-client\":{\"worker-concurrent\":8,\"worker-pool-size\":0,\"region-scan-limit\":40},\"debug\":{\"enable-table-actor\":false,\"enable-db-sorter\":false,\"db\":{\"count\":8,\"concurrency\":128,\"max-open-files\":10000,\"block-size\":65536,\"block-cache-size\":4294967296,\"writer-buffer-size\":8388608,\"compression\":\"snappy\",\"target-file-size-base\":8388608,\"write-l0-slowdown-trigger\":2147483647,\"write-l0-pause-trigger\":2147483647,\"compaction-l0-trigger\":160,\"compaction-deletion-threshold\":160000,\"iterator-max-alive-duration\":10000,\"iterator-slow-read-duration\":256,\"cleanup-speed-limit\":10000},\"enable-new-scheduler\":false,\"messages\":{\"client-max-batch-interval\":200000000,\"client-max-batch-size\":8388608,\"client-max-batch-count
\":128,\"client-retry-rate-limit\":1,\"server-max-pending-message-count\":102400,\"server-ack-interval\":100000000,\"server-worker-pool-size\":4}}}"]
[2022/04/11 18:42:11.194 +08:00] [INFO] [client.go:352] ["[pd] create pd client with endpoints"] [pd-address="[http://10.49.1.30:2379,http://10.49.2.20:2379,http://10.49
.1.187:2379]"]

cdc_stderr.log日志中也一直有记录panic: unsigned value not in type uint64

goroutine 5181 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc038de8180, 0xc04a9db7c0, 0x1, 0x1)
        go.uber.org/zap@v1.19.1/zapcore/entry.go:232 +0x532
go.uber.org/zap.(*Logger).Panic(0xc04703d560, 0x3a201b8, 0x21, 0xc04a9db7c0, 0x1, 0x1)
        go.uber.org/zap@v1.19.1/logger.go:230 +0x85
github.com/pingcap/log.Panic(0x3a201b8, 0x21, 0xc04a9db7c0, 0x1, 0x1)
        github.com/pingcap/log@v0.0.0-20211207084639-71a2e5860834/global.go:54 +0x10b
github.com/pingcap/tiflow/cdc/sink/codec.getJavaSQLType(0xc02ce98580, 0xc094e32500, 0xc, 0x1)
        github.com/pingcap/tiflow/cdc/sink/codec/canal.go:167 +0x352
github.com/pingcap/tiflow/cdc/sink/codec.(*canalEntryBuilder).buildColumn(0xc01eac37e0, 0xc02ce98580, 0xc03fd78030, 0x11, 0x1, 0xc08e934480, 0x0, 0x0)
        github.com/pingcap/tiflow/cdc/sink/codec/canal.go:280 +0x57
github.com/pingcap/tiflow/cdc/sink/codec.(*canalEntryBuilder).buildRowData(0xc01eac37e0, 0xc036c2a880, 0xc0383f27a0, 0x9, 0xc03f041a20)
        github.com/pingcap/tiflow/cdc/sink/codec/canal.go:317 +0x285
github.com/pingcap/tiflow/cdc/sink/codec.(*CanalFlatEventBatchEncoder).newFlatMessageForDML(0xc0411f5980, 0xc036c2a880, 0xc043481e7a, 0x3, 0x3, 0x0)
        github.com/pingcap/tiflow/cdc/sink/codec/canal_flat.go:178 +0xd8
github.com/pingcap/tiflow/cdc/sink/codec.(*CanalFlatEventBatchEncoder).AppendRowChangedEvent(0xc0411f5980, 0xc036c2a880, 0x0, 0x0, 0x3)
        github.com/pingcap/tiflow/cdc/sink/codec/canal_flat.go:321 +0x39
github.com/pingcap/tiflow/cdc/sink.(*mqSink).runWorker(0xc03c5eff40, 0x40dc5a8, 0xc0411f5900, 0xc000000000, 0x0, 0x0)
        github.com/pingcap/tiflow/cdc/sink/mq.go:338 +0x3f1
github.com/pingcap/tiflow/cdc/sink.(*mqSink).run.func1(0x0, 0x2)
        github.com/pingcap/tiflow/cdc/sink/mq.go:267 +0x46
golang.org/x/sync/errgroup.(*Group).Go.func1(0xc041203950, 0xc03922ef00)
        golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57 +0x59
created by golang.org/x/sync/errgroup.(*Group).Go
        golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:54 +0x66
panic: unsigned value not in type uint64

【附件】
asinking-tidb-cluster-TiCDC_2022-04-12T06_02_19.463Z.json (27.3 MB)

panic的表字段信息,没有修改过表结构和字符集,之前一直好的
sales_order_total int(10) unsigned NOT NULL DEFAULT '0'

1 个赞

删掉阻塞的同步任务,其它同步任务回复正常,cdc服务正常
指定start-ts启动报错

"message": "[CDC:ErrSnapshotLostByGC]fail to create or maintain changefeed due to snapshot loss caused by GC. checkpoint-ts 432452065695367172 is earlier than or equal to GC safepoint at 432452065695367172"

start-ts对应的时间点

» tso 432452065695367172 
system:  2022-04-11 18:41:52.522 +0800 CST
logic:   4

gc_safe_point对应的时间点
tikv_gc_safe_point 20220411-18:41:52 +0800
指定start-ts为之后时间,可以启动,但是继续报错panic,cdc不停重启

最后,只好从当前时间启动该同步任务,恢复正常

1 个赞

您好:
这个是一个 bug,已经有了相关 issue 跟踪修复,相关链接 https://github.com/pingcap/tiflow/issues/4736

1 个赞

好的,今天又出现一样的问题了,只能等下个版本修复吗?

不确定在哪个版本能修复,可以关注一下这个 PR

好的呢

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。