【 TiDB 使用环境】生产环境发生,测试环境复现
【 TiDB 版本】 v5.3.0
【复现路径】对64张分表添加一个varchar(64)的字段后,删除数据时发生
【遇到的问题:问题现象及影响】
64张分表通过ticdc同步到kafka,采用maxwell格式(经测试使用其它格式不会有问题),对这64张表执行
ALTER TABLE xxx_statinfo_0 ADD trace_id varchar(64) DEFAULT '' NOT NULL;
ALTER TABLE xxx_statinfo_1 ADD trace_id varchar(64) DEFAULT '' NOT NULL;
......
ALTER TABLE xxx_statinfo_63 ADD trace_id varchar(64) DEFAULT '' NOT NULL;
然后对这64张表进行数据删除(遍历64张表进行delete xxx_statinfo_x where xxx limit 5000)后发生以下情况:
1、cdc cli changefeed list中的checkpoint不变,偶尔会报错,如下:
[root@localhost eric]# cdc cli changefeed list --pd=http://192.168.100.162:2379
[
{
"id": "socol-statinfo",
"summary": {
"state": "normal",
"tso": 440123479891640321,
"checkpoint": "2023-03-16 11:37:15.280",
"error": null
}
}
]
[root@localhost eric]# cdc cli changefeed list --pd=http://192.168.100.162:2379
[2023/03/16 11:37:32.650 +08:00] [WARN] [cli_changefeed_list.go:102] ["query changefeed info failed"] [error="Post \"http://192.168.100.166:8300/capture/owner/changefeed/query\": dial tcp 192.168.100.166:8300: connect: connection refused"]
[
{
"id": "socol-statinfo",
"summary": null
}
]
2、查看cdc_stderr.log有报“panic: interface conversion: interface {} is string, not []uint8”,问题和
新增ticdc到kafka同步任务后ticdc组件不断重启 - #4,来自 LingJin 这个很像,但我的版本是v5.3.0,按那贴子说的 5.0.4 及之后的版本可以解决对不上
3、删除这个任务并按原来的tso来重新创建问题依旧
4、删除这个任务,tso取delete操作完成后的时间进行重新创建任务,ticdc正常,checkpoint也有变动,但只要一发生删除操作问题就会继续发生
5、测试过unsafe reset、把ticdc组件全缩容掉扩容回来然后重新建任务,问题依旧。
【资源配置】
测试环境的配置如下:
[root@localhost eric]# tiup cluster display tidb-test
tiup is checking updates for component cluster ...
A new version of cluster is available:
The latest version: v1.11.3
Local installed version: v1.11.1
Update current component: tiup update cluster
Update all components: tiup update --all
Starting component `cluster`: /root/.tiup/components/cluster/v1.11.1/tiup-cluster /root/.tiup/components/cluster/v1.11.1/tiup-cluster display tidb-test
Cluster type: tidb
Cluster name: tidb-test
Cluster version: v5.3.0
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://192.168.100.164:2379/dashboard
Grafana URL: http://192.168.100.161:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
192.168.100.161:9093 alertmanager 192.168.100.161 9093/9094 linux/x86_64 Up /data/tidb-data/alertmanager-9093 /data/tidb-deploy/alertmanager-9093
192.168.100.161:8300 cdc 192.168.100.161 8300 linux/x86_64 Up /data/tidb-data/cdc-8300 /data/tidb-deploy/cdc-8300
192.168.100.166:8300 cdc 192.168.100.166 8300 linux/x86_64 Up /data/tidb-data/cdc-8300 /data/tidb-deploy/cdc-8300
192.168.100.161:3000 grafana 192.168.100.161 3000 linux/x86_64 Up - /data/tidb-deploy/grafana-3000
192.168.100.162:2379 pd 192.168.100.162 2379/2380 linux/x86_64 Up /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
192.168.100.163:2379 pd 192.168.100.163 2379/2380 linux/x86_64 Up /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
192.168.100.164:2379 pd 192.168.100.164 2379/2380 linux/x86_64 Up|L|UI /data/tidb-data/pd-2379 /data/tidb-deploy/pd-2379
192.168.100.161:9090 prometheus 192.168.100.161 9090 linux/x86_64 Up /data/tidb-data/prometheus-9090 /data/tidb-deploy/prometheus-9090
192.168.100.161:4000 tidb 192.168.100.161 4000/10080 linux/x86_64 Up - /data/tidb-deploy/tidb-4000
192.168.100.166:9000 tiflash 192.168.100.166 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /data/tidb-data/tiflash-9000 /data/tidb-deploy/tiflash-9000
192.168.100.162:20160 tikv 192.168.100.162 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
192.168.100.163:20160 tikv 192.168.100.163 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
192.168.100.164:20160 tikv 192.168.100.164 20160/20180 linux/x86_64 Up /data/tidb-data/tikv-20160 /data/tidb-deploy/tikv-20160
Total nodes: 13
【附件:截图/日志/监控】
cdc_log.tar.gz (223.5 KB)
cdc_stderr.log (18.0 KB)
【其它】
表结构如下
CREATE TABLE `xxx_statinfo_0` (
`id` int(10) NOT NULL AUTO_INCREMENT ,
`imei` varchar(30) NOT NULL DEFAULT '' ,
`device_no` varchar(128) NOT NULL DEFAULT '' ,
`action` tinyint(2) NOT NULL DEFAULT '0' ,
`seq` varchar(36) NOT NULL,
`source` tinyint(2) NOT NULL DEFAULT '0' ,
`img_size` int(11) unsigned NOT NULL DEFAULT '0' ,
`img_total` smallint(6) unsigned NOT NULL DEFAULT '0' ,
`vedio_duration` int(11) NOT NULL DEFAULT '0' ,
`vedio_size` int(11) NOT NULL DEFAULT '0' ,
`img_url` mediumtext NOT NULL , # 存储的是base64值
`vedio_url` varchar(256) NOT NULL DEFAULT '' ,
`upload_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' ,
`create_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' ,
`update_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00' ,
`append_size` int(11) unsigned NOT NULL DEFAULT '0' ,
`total_size` int(11) unsigned NOT NULL DEFAULT '0' ,
`apk_version` varchar(50) NOT NULL DEFAULT '0' ,
`is_compress` tinyint(2) unsigned NOT NULL DEFAULT '0' ,
`error_code` int(5) unsigned NOT NULL DEFAULT '0' ,
`mosaic_type` tinyint(1) NOT NULL DEFAULT '0' ,
`mosaic_size` int(10) NOT NULL DEFAULT '0' ,
`resolution` int(10) NOT NULL DEFAULT '0' ,
`isCut` tinyint(1) NOT NULL DEFAULT '0' , # 以前新增的字段没发生问题
`trace_id` varchar(64) NOT NULL DEFAULT '' , # 最近新增这个字段时出问题
PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */,
KEY `idx_imei_source_seq` (`imei`,`source`,`seq`),
KEY `idx_device_source_seq` (`device_no`,`source`,`seq`),
KEY `idx_create_time` (`create_time`)
);