ticdc 双向同步出现数据丢失

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
tidb v4.0.11

【概述】 场景 + 问题概述
用TICDC做了两张表互相同步,偶尔会有数据没有同步

请问下你这边的双向同步具体是如何测试的,有测试的步骤和报错现象吗?另外也建议将 ticdc 升级到 v4.0.14 ,该版本目前稳定性会好点。

您好,我们是通过:
select count(*) from table where …
语句发现两张互相环形同步的表的sql语句执行的结果不一样,
并且结果的差异是随时间变化而变化的,说明不间断的有数据没有被同步。
两张表的数据差别总体不大,偶尔会发生数据同步丢失

我们有升级到5.0的想法,但是5版本的bug fix我没有看到符合我们场景的bug修复

[2021/08/29 12:55:46.231 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=438222] [requestID=102536] [span="[748000000000002dff915f000000000000f9, 748000000000002dff9160000000000000f9)"] [checkpoint=427358092254248965] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:438222 > “]
[2021/08/29 12:55:46.271 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=438222] [requestID=102540] [span=”[748000000000002dff915f000000000000f9, 748000000000002dff9160000000000000f9)"] [checkpoint=427358092254248965] [error="[CDC:ErrEventFeedEventError]region_not_found:<region_id:438222 > “]
[2021/08/29 12:56:06.435 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=427229] [requestID=102528] [span=”[748000000000002cff8f5f000000000000f9, 748000000000002cff8f5f728000000000ff00ea520000000000fa)"] [checkpoint=427358097497128961] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:427229 > “]
[2021/08/29 12:56:06.474 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=427229] [requestID=102542] [span=”[748000000000002cff8f5f000000000000f9, 748000000000002cff8f5f728000000000ff00ea520000000000fa)"] [checkpoint=427358097497128961] [error="[CDC:ErrEventFeedEventError]region_not_found:<region_id:427229 > “]
[2021/08/29 12:56:08.707 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=349897] [requestID=102537] [span=”[748000000000002cff5b5f000000000000f9, 748000000000002cff5b60000000000000f9)"] [checkpoint=427358098021416973] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:349897 > “]
[2021/08/29 12:56:08.748 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=349897] [requestID=102544] [span=”[748000000000002cff5b5f000000000000f9, 748000000000002cff5b60000000000000f9)"] [checkpoint=427358098021416973] [error="[CDC:ErrEventFeedEventError]region_not_found:<region_id:349897 > “]
[2021/08/29 12:59:27.775 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=349897] [requestID=102545] [span=”[748000000000002cff5b5f000000000000f9, 748000000000002cff5b60000000000000f9)"] [checkpoint=427358150240501769] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:349897 > “]
[2021/08/29 12:59:27.815 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=349897] [requestID=102546] [span=”[748000000000002cff5b5f000000000000f9, 748000000000002cff5b60000000000000f9)"] [checkpoint=427358150240501769] [error="[CDC:ErrEventFeedEventError]region_not_found:<region_id:349897 > “]
[2021/08/29 12:59:48.964 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=438222] [requestID=102541] [span=”[748000000000002dff915f000000000000f9, 748000000000002dff9160000000000000f9)"] [checkpoint=427358155915919361] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:438222 > “]
[2021/08/29 12:59:49.005 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=438222] [requestID=102548] [span=”[748000000000002dff915f000000000000f9, 748000000000002dff9160000000000000f9)"] [checkpoint=427358155915919361] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:438222 leader:<id:15064270 store_id:1 > > “]
[2021/08/29 12:59:54.289 +00:00] [INFO] [client.go:871] [“EventFeed disconnected”] [regionID=12865082] [requestID=102539] [span=”[748000000000002cff8f5f728000000000ff00ea520000000000fa, 748000000000002cff8f5f728000000000ff0584c00000000000fa)"] [checkpoint=427358157228736518] [error="[CDC:ErrEventFeedEventError]not_leader:<region_id:12865082 > "]

日志中有大量的这种信息,可能是leader切换频繁导致丢数据

[ERROR] [owner.go:1170] [“watch owner campaign key failed, restart the watcher”] [error=“etcdserver: mvcc: required revision has been compacted”]

2021/09/25 03:01:42.740 +00:00] [ERROR] [client.go:1096] [“failed to receive from stream”] [addr=10.247.129.80:20160] [storeID=5] [error=“rpc error: code = Unavailable desc = transport is closing”]

同时也有这两种错误

1.环形同步功能还处于实验阶段,不建议用于生产环境;
2.建议将集群和 ticdc 都升级至最新的 v4.0.15 版本,v4.0.11 ticdc 功能还不够稳定,升级后效果应该会有所改善;
3.region leader 切换时 cdc 会重连 tikv 产生延时,应该不会导致数据发生丢失,从你上面的描述来看更像是同步延迟导致的上下游数据统计结果差异。