TiCDC 升级出现panic 现象

【 TiDB 使用环境】生产\测试环境\ POC
生产
【 TiDB 版本】
4.0.8 升级到5.3.1
【遇到的问题】
cdc_stderr.log

  goroutine 705 [running]:
github.com/pingcap/tiflow/cdc/model.(*ChangeFeedInfo).FixIncompatible(0x0)
        github.com/pingcap/tiflow/cdc/model/changefeed.go:247 +0x37
github.com/pingcap/tiflow/cdc/owner.fixChangefeedInfos.func1(0x0, 0x203000, 0x203000, 0x203000, 0x90)
        github.com/pingcap/tiflow/cdc/owner/owner.go:284 +0x2b
github.com/pingcap/tiflow/pkg/orchestrator.(*ChangefeedReactorState).PatchInfo.func1(0x0, 0x0, 0x413ec2, 0xc00296f038, 0x5a96dcf, 0x3109e12f2e3c4a4b, 0x30)
        github.com/pingcap/tiflow/pkg/orchestrator/reactor_state.go:297 +0xa2
github.com/pingcap/tiflow/pkg/orchestrator.(*ChangefeedReactorState).patchAny.func1(0x0, 0x0, 0x0, 0x3e, 0x6379680, 0x33386c0, 0x1, 0xc002a42b70, 0xc00296f088)
        github.com/pingcap/tiflow/pkg/orchestrator/reactor_state.go:390 +0x13a
github.com/pingcap/tiflow/pkg/orchestrator.(*SingleDataPatch).Patch(0xc000918588, 0xc002a42570, 0xc002a42b40, 0x53, 0xc002566088)
        github.com/pingcap/tiflow/pkg/orchestrator/interfaces.go:55 +0x82
github.com/pingcap/tiflow/pkg/orchestrator.getChangedState(0xc002a42570, 0xc000659560, 0x1, 0x1, 0xc00133dd98, 0x4b2, 0x0, 0x0)
        github.com/pingcap/tiflow/pkg/orchestrator/batch.go:77 +0xa5
github.com/pingcap/tiflow/pkg/orchestrator.getBatchChangedState(0xc002a42570, 0xc00032cf00, 0x1a, 0x1a, 0x4, 0x4, 0xc0007ad380, 0xc00296f2e0, 0x2a4e7d3)
        github.com/pingcap/tiflow/pkg/orchestrator/batch.go:41 +0x17e
github.com/pingcap/tiflow/pkg/orchestrator.(*EtcdWorker).applyPatchGroups(0xc0005f0380, 0x7fe95e4d4028, 0xc0008b6040, 0xc00032cf00, 0x1a, 0x1a, 0x1, 0x1, 0x0, 0x2, ...)
        github.com/pingcap/tiflow/pkg/orchestrator/etcd_worker.go:345 +0xc5
github.com/pingcap/tiflow/pkg/orchestrator.(*EtcdWorker).Run(0xc0005f0380, 0x7fe95e4d4028, 0xc0008b6040, 0xc0008be060, 0xbebc200, 0x7fff34a0de3a, 0x11, 0x34d23cb, 0x5, 0x0, ...)
        github.com/pingcap/tiflow/pkg/orchestrator/etcd_worker.go:205 +0xb87
github.com/pingcap/tiflow/cdc/capture.(*Capture).runEtcdWorker(0xc00065a320, 0x3b9af98, 0xc0008b6040, 0x3b16340, 0xc001660000, 0x3b4c0b8, 0xc00278e990, 0xbebc200, 0x34d23cb, 0x5, ...)
        github.com/pingcap/tiflow/cdc/capture/capture.go:299 +0x185
github.com/pingcap/tiflow/cdc/capture.(*Capture).campaignOwner(0xc00065a320, 0x3b9af98, 0xc0008b6040, 0xc000085000, 0xc0027b6e88)
        github.com/pingcap/tiflow/cdc/capture/capture.go:271 +0x6ee
github.com/pingcap/tiflow/cdc/capture.(*Capture).run.func2(0xc000b38030, 0xc00065a320, 0x3b9af98, 0xc0008b6040, 0xc000b3e010)
        github.com/pingcap/tiflow/cdc/capture/capture.go:192 +0xb5
created by github.com/pingcap/tiflow/cdc/capture.(*Capture).run
        github.com/pingcap/tiflow/cdc/capture/capture.go:186 +0x2c8
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x98 pc=0x178a057]

【复现路径】做过哪些操作出现的问题
【问题现象及影响】
【附件】
image

  • 相关日志、配置文件、Grafana 监控(https://metricstool.pingcap.com/)
  • TiUP Cluster Display 信息
  • TiUP CLuster Edit config 信息
  • TiDB-Overview 监控
  • 对应模块的 Grafana 监控(如有 BR、TiDB-binlog、TiCDC 等)
  • 对应模块日志(包含问题前后 1 小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

感谢反馈,已建 github issue,我们会尽快跟进 https://github.com/pingcap/tiflow/issues/5266

我在本地没能复现这个问题,能麻烦您提供一下 cdc 的日志吗?另外能麻烦用 ./cdc cli changefeed query -c changefeed-name 查询一下出错的 changefeed 信息吗?

2.log (151.8 KB)
这个是cdc的日志

集群有做过其他的升级操作吗? 为啥我在日志看到 changefeed 的创建版本是 v5.0.0-dev-dirty,changefeed 原来是怎么创建的啊?

我检查了一下, 这个集群过, cdc升级成功了, 但其它(tidb, tikv)的好像还是在4.0.8

能麻烦用 ./cdc cli changefeed query -c changefeed-name 查询一下出错的 changefeed 信息吗?

现在查询还有用吗? 我已经自行修改了代码,在代码那加了nil 判断, 让cdc服务先恢复了
我也不太清楚到底是哪(个)些任务导致的NPE问题, 需要的话, 我查一下, 其实任务的信息我看了一下,在上面的日志中都有

这是个已知问题,我们有个 fix 但是漏了一部分代码。https://github.com/pingcap/tiflow/issues/5266#issuecomment-1111711739 我们重新修复一下。谢谢反馈!

1赞