CDC进行两个集群间数据同步测试,压测任务完成后数据不一致

【 TiDB 使用环境】POC
【 TiDB 版本】7.5.1
【复现路径】
源端是本地DC创建的一个1tidb 1pd 3tikv的TiDB集群,目的端是一个云主机创建的1tidb 1pd 1tikv单节点集群。

操作步骤如下:

源端关闭gc
MySQL [test]> SET GLOBAL tidb_gc_enable=FALSE;

源端进行全备
MySQL [(none)]> BACKUP DATABASE * TO ‘local:///newsftp/bakcup20240305_03’;

传送文件至目标端

目标端还原
MySQL [(none)]> RESTORE DATABASE * FROM ‘local:///opt/bakcup20240305_03’;

进行数据比较,比较结果如下
[tidb@localhost ~]$ sync_diff_inspector -C ./config.yaml
{“level”:“warn”,“ts”:“2024-03-05T20:26:06.99427+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.10/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc0010ee1c0/172.16.16.11:2379”,“attempt”:0,“error”:“rpc error: code = DeadlineExceeded desc = context deadline exceeded”}
A total of 11 tables need to be compared

Comparing the table structure of test`.`t … equivalent
Comparing the table structure of test`.`warehouse … equivalent
Comparing the table data of test`.`t … equivalent
Comparing the table data of test`.`warehouse … equivalent
Comparing the table structure of test`.`item … equivalent
Comparing the table structure of test`.`new_order … equivalent
Comparing the table structure of test`.`district … equivalent
Comparing the table data of test`.`district … equivalent
Comparing the table data of test`.`item … equivalent
Comparing the table data of test`.`new_order … equivalent
Comparing the table structure of test`.`orders … equivalent
Comparing the table structure of test_wdd`.`test_wdd … equivalent
Comparing the table structure of test`.`stock … equivalent
Comparing the table structure of test`.`customer … equivalent
Comparing the table structure of test`.`history … equivalent
Comparing the table structure of test`.`order_line … equivalent
Comparing the table data of test`.`orders … equivalent
Comparing the table data of test_wdd`.`test_wdd … equivalent
Comparing the table data of test`.`stock … equivalent
Comparing the table data of test`.`customer … equivalent
Comparing the table data of test`.`history … equivalent
Comparing the table data of test`.`order_line … equivalent


Progress [============================================================>] 100% 0/0
A total of 11 table have been compared and all are equal.

配置cdc任务
tiup cdc cli changefeed create --server=http://...:8300 --sink-uri=“mysql://root:@...:4000” --changefeed-id=“upstream-to-downstream” --start-ts=“448171971630858243”

查看任务状态
[tidb@localhost ~]$ tiup cdc cli changefeed list
Checking updates for component cdc… Timedout (after 2s)
Starting component cdc: /home/tidb/.tiup/components/cdc/v7.5.1/cdc cli changefeed list
[
{
“id”: “upstream-to-downstream”,
“namespace”: “default”,
“summary”: {
“state”: “normal”,
“tso”: 448172355724509186,
“checkpoint”: “2024-03-05 20:30:59.911”,
“error”: null
}
}
]

进行数据库压测
[tidb@localhost ~]$ tiup bench tpcc --warehouses 20 --time 10m run -p “*******”

【遇到的问题:问题现象及影响】

压测后对源目标端数据进行比较,数据量不一致:
源端:
MySQL [test]> select 'customer ',count() from customer union all select 'district ',count() from district union all select 'history ',count() from history union all select 'item ',count() from item union all select 'new_order ',count() from new_order union all select ‘order_line’,count() from order_line union all select 'orders ',count() from orders union all select 'stock ',count() from stock union all select 'warehouse ',count() from warehouse order by 1;
±-----------±---------+
| customer | count(
) |
±-----------±---------+
| customer | 600000 |
| district | 200 |
| history | 601935 |
| item | 100000 |
| new_order | 179874 |
| orders | 601934 |
| order_line | 6017699 |
| stock | 2000000 |
| warehouse | 20 |
±-----------±---------+
9 rows in set, 1 warning (0.28 sec)

目标端:
MySQL [test]> select 'customer ',count() from customer union all select 'district ',count() from district union all select 'history ',count() from history union all select 'item ',count() from item union all select 'new_order ',count() from new_order union all select ‘order_line’,count() from order_line union all select 'orders ',count() from orders union all select 'stock ',count() from stock union all select 'warehouse ',count() from warehouse order by 1;
±-----------±---------+
| customer | count(
) |
±-----------±---------+
| customer | 600000 |
| district | 200 |
| history | 600891 |
| item | 100000 |
| new_order | 179874 |
| orders | 601934 |
| order_line | 6017699 |
| stock | 2000000 |
| warehouse | 20 |
±-----------±---------+
9 rows in set, 1 warning (0.02 sec)

history表数据量不一致

history表结构如下:
MySQL [test]> show create table history\G
*************************** 1. row ***************************
Table: history
Create Table: CREATE TABLE history (
h_c_id int(11) NOT NULL,
h_c_d_id int(11) NOT NULL,
h_c_w_id int(11) NOT NULL,
h_d_id int(11) NOT NULL,
h_w_id int(11) NOT NULL,
h_date datetime DEFAULT NULL,
h_amount decimal(6,2) DEFAULT NULL,
h_data varchar(24) DEFAULT NULL,
KEY idx_h_w_id (h_w_id),
KEY idx_h_c_w_id (h_c_w_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
PARTITION BY HASH (h_w_id) PARTITIONS 20
1 row in set (0.00 sec)

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面

【附件:截图/日志/监控】

监控截图

开始任务命令截图

开始任务至发帖时间日志
cdc_download.log (1.2 MB)

用 diff 比较下会输出差异数据,然后去 ticdc 的日志里看看有没有线索

e,我的理解,压测产生的数据不是就应该不固定吗?

看看同步是不是报错了哦

有开建表带主键参数吗?

感觉可能是历史表没有主键的原因

默认配置部署的集群,没有单独设置,不过我在创建同步任务的时候没有提示这几张表无法同步,这些表是通过tiup bench压测的数据

默认配置部署的集群,没有单独设置,不过我在创建同步任务的时候没有提示这几张表无法同步,这些表是通过tiup bench压测的数据

看了下这个阶段的日志,没有明显报错

用tiup bench准备的20仓数据,然后压测10分钟后再进行的全备+恢复,然后配置的CDC

我补充下同步阶段的CDC日志,没有看到明显报错

少压点,看看差在哪里了,好查些

cdc对同步表的要求就是带主键或有效索引,你这个数据不一致是预期的,参考:https://docs.pingcap.com/zh/tidb/stable/ticdc-overview#最佳实践

楼上大佬说得对,TiCDC 同步的表需要至少存在一个有效索引的表

过程中有异常提示吗

原始表和数据不是用 tiup bench tpc prepare 的么?
https://github.com/pingcap/go-tpc/blob/master/tpcc/ddl.go#L196-L213

cdc对同步表的要求确实就是带主键或有效索引,以前吃过亏了

我看了对应阶段,没看到错误日志,目前cdc日志已经上传了

嗯嗯,是的,用tiup bench准备了20仓数据,并压测十分钟后,再进行备份恢复,并配置同步任务,配置完成后,再利用tiup bench压测十分钟

搭建任务的时候有个表啥索引都没有,提示了不能同步,但这部分表没提示,以为可以正常执行呢