升级v6.1.5后,ticdc频繁重启

【 TiDB 使用环境】生产环境 /测试/ Poc
生产环境
【 TiDB 版本】
V6.1.5
【复现路径】做过哪些操作出现的问题
从V4.0.15版本升级到V6.1.5
【遇到的问题:问题现象及影响】
ticdc进程频繁重启。
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面

【附件:截图/日志/监控】

cdc1.zip (3.6 MB)

报错的文字贴一下~

随便贴一下你的资源配置

[2023/05/15 22:12:29.282 +08:00] [WARN] [client.go:171] [“peer message client detected error, restarting”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.110.45.12:8300: connect: connection refused"”] [errorVerbose=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.110.45.12:8300: connect: connection refused"\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/juju_adaptor.go:15\ngithub.com/pingcap/tiflow/pkg/p2p.(*MessageClient).launchStream\n\tgithub.com/pingcap/tiflow/pkg/p2p/client.go:187\ngithub.com/pingcap/tiflow/pkg/p2p.(*MessageClient).Run\n\tgithub.com/pingcap/tiflow/pkg/p2p/client.go:166\ngithub.com/pingcap/tiflow/pkg/p2p.(*messageRouterImpl).GetClient.func1\n\tgithub.com/pingcap/tiflow/pkg/p2p/message_router.go:144\nruntime.goexit\n\truntime/asm_amd64.s:1594”]

感觉是你的cdc 有问题啊,都可以访问吗? transport: Error while dialing dial tcp 10.110.45.12:8300: connect: connection refused

ticdc有同步到mysql和kafka,停掉同步kafka的进程后,就恢复正常了。

CDC的日志也没有看到具体报error错误,就是cdc进程不停chong’qi

cdc日志里没有看到具体报错日志,看操作系统日志,就是cdc进程在不停重启。

重启前的cdc.log 没有明显的报错日志吗?

没有,我上传了cdc.log。cdc_stderr.log里有报错日志。
cdc1.zip (3.6 MB)

[tidb@cnhw-vm-ticdc-am01 log]$ tail -f cdc_stderr.log
github.com/pingcap/tiflow/cdc/sink/mq/codec.(*MaxwellEventBatchEncoder).AppendRowChangedEvent(0xc004ee9cc8, {0xc0079cde94?, 0x0?}, {0x0?, 0xc0079cde68?}, 0x1?)
github.com/pingcap/tiflow/cdc/sink/mq/codec/maxwell.go:172 +0x25
github.com/pingcap/tiflow/cdc/sink/mq/codec.(*encoderGroup).runEncoder(0xc008c3de00, {0x3abadc8, 0xc019b96240}, 0x3)
github.com/pingcap/tiflow/cdc/sink/mq/codec/encoder_group.go:114 +0x35d
github.com/pingcap/tiflow/cdc/sink/mq/codec.(*encoderGroup).Run.func2()
github.com/pingcap/tiflow/cdc/sink/mq/codec/encoder_group.go:93 +0x2c
golang.org/x/sync/errgroup.(*Group).Go.func1()
golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
golang.org/x/sync@v0.0.0-20220722155255-886fb9371eb4/errgroup/errgroup.go:72 +0xa5

cdc.log 中没有明显的错误,直接重启的。应该在 stderr 中有具体的错误。麻烦发下 完整的 cdc_stderr.log。多谢。

cdc_stderr.log (1.9 MB)

https://github.com/pingcap/tiflow/issues/2758 跟这个bug的报错信息一样,V6.1.5版本又复现了这个bug?