ticdc对于非主键自增序列表同步中断,但同步状态正常

【 TiDB 使用环境】测试
【 TiDB 版本】v7.2.0
从tidb集群通过ticdc同步复制到tidb集群,对于新创建的表如果存在非主键自增序列字段,那么会导致整个changefeed任务中断,但是通过命令查询任务还是正常状态。

上游集群:v7.2.0 192.168.31.201:4003 root:root(安装cdc)
下游集群:v6.5.3 192.168.31.201:4000 root:root
具体集群信息:

[tidb@host0 config_file]$ tiup cluster display tidb-test
tiup is checking updates for component cluster ...
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.12.5/tiup-cluster display tidb-test
Cluster type:       tidb
Cluster name:       tidb-test
Cluster version:    v6.5.3
Deploy user:        tidb
SSH type:           builtin
Dashboard URL:      http://192.168.31.201:2379/dashboard
Grafana URL:        http://192.168.31.201:3000
ID                    Role          Host            Ports                            OS/Arch       Status   Data Dir                           Deploy Dir
--                    ----          ----            -----                            -------       ------   --------                           ----------
192.168.31.201:9093   alertmanager  192.168.31.201  9093/9094                        linux/x86_64  Up       /data/tidb-data/alertmanager-9093  /data/tidb-deploy/alertmanager-9093
192.168.31.201:3000   grafana       192.168.31.201  3000                             linux/x86_64  Up       -                                  /data/tidb-deploy/grafana-3000
192.168.31.201:2379   pd            192.168.31.201  2379/2380                        linux/x86_64  Up|L|UI  /data/tidb-data/pd-2379            /data/tidb-deploy/pd-2379
192.168.31.201:9090   prometheus    192.168.31.201  9090/12020                       linux/x86_64  Up       /data/tidb-data/prometheus-9090    /data/tidb-deploy/prometheus-9090
192.168.31.201:4000   tidb          192.168.31.201  4000/10080                       linux/x86_64  Up       -                                  /data/tidb-deploy/tidb-4000
192.168.31.201:9000   tiflash       192.168.31.201  9000/8123/3930/20170/20292/8234  linux/x86_64  Up       /data/tidb-data/tiflash-9000       /data/tidb-deploy/tiflash-9000
192.168.31.201:20160  tikv          192.168.31.201  20160/20180                      linux/x86_64  Up       /data/tidb-data/tikv-20160         /data/tidb-deploy/tikv-20160
Total nodes: 7
[tidb@host0 config_file]$ tiup cluster display tidb-test2
tiup is checking updates for component cluster ...
Starting component `cluster`: /home/tidb/.tiup/components/cluster/v1.12.5/tiup-cluster display tidb-test2
Cluster type:       tidb
Cluster name:       tidb-test2
Cluster version:    v7.2.0
Deploy user:        tidb
SSH type:           builtin
Dashboard URL:      http://192.168.31.201:2579/dashboard
Grafana URL:        http://192.168.31.201:3003
ID                    Role          Host            Ports                            OS/Arch       Status   Data Dir                           Deploy Dir
--                    ----          ----            -----                            -------       ------   --------                           ----------
192.168.31.201:9393   alertmanager  192.168.31.201  9393/9394                        linux/x86_64  Up       /data/tidb-data/alertmanager-9393  /data/tidb-deploy/alertmanager-9393
192.168.31.201:8300   cdc           192.168.31.201  8300                             linux/x86_64  Up       /data/tidb-data/cdc-8300           /data/tidb-deploy/cdc-8300
192.168.31.201:3003   grafana       192.168.31.201  3003                             linux/x86_64  Up       -                                  /data/tidb-deploy/grafana-3003
192.168.31.201:2579   pd            192.168.31.201  2579/2880                        linux/x86_64  Up|L|UI  /data/tidb-data/pd-2579            /data/tidb-deploy/pd-2579
192.168.31.201:9390   prometheus    192.168.31.201  9390/12324                       linux/x86_64  Up       /data/tidb-data/prometheus-9390    /data/tidb-deploy/prometheus-9390
192.168.31.201:4003   tidb          192.168.31.201  4003/10083                       linux/x86_64  Up       -                                  /data/tidb-deploy/tidb-4003
192.168.31.201:9003   tiflash       192.168.31.201  9003/8125/3933/20173/20295/8236  linux/x86_64  Up       /data/tidb-data/tiflash-9003       /data/tidb-deploy/tiflash-9003
192.168.31.201:20163  tikv          192.168.31.201  20163/20183                      linux/x86_64  Up       /data/tidb-data/tikv-20163         /data/tidb-deploy/tikv-20163
Total nodes: 8
[tidb@host0 config_file]$ 
#初始化上下游集群的test数据库
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot mysql -e "drop database test;create database test;"
mysql: [Warning] Using a password on the command line interface can be insecure.
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4000 -uroot -proot mysql -e "drop database test;create database test;"
mysql: [Warning] Using a password on the command line interface can be insecure.
#查询上下游数据库中表均为空
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4000 -uroot -proot test -e "show tables"
mysql: [Warning] Using a password on the command line interface can be insecure.
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "show tables"
mysql: [Warning] Using a password on the command line interface can be insecure.

新建changefeed任务:
[tidb@host0 config_file]$ tiup cdc cli changefeed create --server=http://192.168.31.201:8300 --sink-uri="mysql://root:root@192.168.31.201:4000/" --changefeed-id="simple-replication-task" --no-confirm
tiup is checking updates for component cdc ...
Starting component `cdc`: /home/tidb/.tiup/components/cdc/v7.2.0/cdc cli changefeed create --server=http://192.168.31.201:8300 --sink-uri=mysql://root:root@192.168.31.201:4000/ --changefeed-id=simple-replication-task --no-confirm
[WARN] Some tables are not eligible to replicate, because they do not have a primary key or a not-null unique key: []v2.TableName{v2.TableName{Schema:"tpch1", Table:"customer_bak", TableID:246, IsPartition:false}}
Create changefeed successfully!
ID: simple-replication-task
Info: {"upstream_id":7248279335184372145,"namespace":"default","id":"simple-replication-task","sink_uri":"mysql://root:xxxxx@192.168.31.201:4000/","create_time":"2023-07-24T21:54:29.203804473+08:00","start_ts":443077589504098309,"config":{"memory_quota":1073741824,"case_sensitive":true,"enable_old_value":true,"force_replicate":false,"ignore_ineligible_table":true,"check_gc_safe_point":true,"enable_sync_point":false,"bdr_mode":false,"sync_point_interval":600000000000,"sync_point_retention":86400000000000,"filter":{"rules":["*.*"]},"mounter":{"worker_num":16},"sink":{"csv":{"delimiter":",","quote":"\"","null":"\\N","include_commit_ts":false},"encoder_concurrency":16,"terminator":"\r\n","date_separator":"day","enable_partition_separator":true,"enable_kafka_sink_v2":false,"only_output_updated_columns":false,"delete_only_output_handle_key_columns":false,"large_message_only_handle_key_columns":false},"consistent":{"level":"none","max_log_size":64,"flush_interval":2000,"use_file_backend":false},"scheduler":{"enable_table_across_nodes":false,"region_threshold":100000,"write_key_threshold":0},"integrity":{"integrity_check_level":"none","corruption_handle_level":"warn"}},"state":"normal","creator_version":"v7.2.0","resolved_ts":443077589504098309,"checkpoint_ts":443077589504098309,"checkpoint_time":"2023-07-24 21:54:29.141"}

#检查changefeed状态:
]
[tidb@host0 config_file]$ tiup cdc cli changefeed list --server=http://192.168.31.201:8300
tiup is checking updates for component cdc ...
Starting component `cdc`: /home/tidb/.tiup/components/cdc/v7.2.0/cdc cli changefeed list --server=http://192.168.31.201:8300
[
  {
    "id": "simple-replication-task",
    "namespace": "default",
    "summary": {
      "state": "normal",
      "tso": 443077599019663361,
      "checkpoint": "2023-07-24 21:55:05.440",
      "error": null
    }
  }
]

可以看到创建好ticdc后同步状态正常,且error为null。

上游创建测试表t

[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "create table t (id int primary key)"
mysql: [Warning] Using a password on the command line interface can be insecure.
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "insert into t values (1),(2)"
mysql: [Warning] Using a password on the command line interface can be insecure.
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "select * from t"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+
| id |
+----+
|  1 |
|  2 |
+----+

下游查询测试表t(与上游数据相同)

[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4000 -uroot -proot test -e "select * from t"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+
| id |
+----+
|  1 |
|  2 |
+----+

上游创建(不能同步)测试表t1:

[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "create table t1(id int primary key,id_auto int auto_increment)"
mysql: [Warning] Using a password on the command line interface can be insecure.
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "insert into t1(id) values (1),(2)"
mysql: [Warning] Using a password on the command line interface can be insecure.
[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "select * from t1"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+---------+
| id | id_auto |
+----+---------+
|  1 |       1 |
|  2 |       2 |
+----+---------+

下游查询相关记录:

[tidb@host0 config_file]$ mysql -h 192.168.31.201 -P4000 -uroot -proot test -e "select * from t1"
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1146 (42S02) at line 1: Table 'test.t1' doesn't exist
[tidb@host0 config_file]$ 

可以看到t1这张表并没有同步过去。

检查同步状态:

[tidb@host0 config_file]$ tiup cdc cli changefeed list --server=http://192.168.31.201:8300
tiup is checking updates for component cdc ...
Starting component `cdc`: /home/tidb/.tiup/components/cdc/v7.2.0/cdc cli changefeed list --server=http://192.168.31.201:8300
[
  {
    "id": "simple-replication-task",
    "namespace": "default",
    "summary": {
      "state": "normal",
      "tso": 443077639481589762,
      "checkpoint": "2023-07-24 21:57:39.790",
      "error": null
    }
  }
]

从命令行查询同步状态正常,error为null。
观察ticdc日志中存在warning警告:

[2023/07/24 21:59:09.674 +08:00] [WARN] [mysql_ddl_sink.go:125] ["Execute DDL with error, retry later"] [startTs=443077639468482581] [ddl="CREATE TABLE `t1` (`id` INT PRIMARY KEY,`id_auto` INT AUTO_INCREMENT)"] [namespace=default] [changefeed=simple-replication-task] [error="Error 1075 (42000): Incorrect table definition; there can be only one auto column and it must be defined as a key"]

那么仅仅是t1这张表不能同步还是其它表也不能同步了呢?
对上游t1表重新加载数据,看是否还能正常写入:

[root@localhost log]# mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "insert into t values (3)"
mysql: [Warning] Using a password on the command line interface can be insecure.
[root@localhost log]# mysql -h 192.168.31.201 -P4003 -uroot -proot test -e "select * from t"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+

下游查询:

[root@localhost log]# mysql -h 192.168.31.201 -P4000 -uroot -proot test -e "select * from t"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----+
| id |
+----+
|  1 |
|  2 |
+----+

t表也没有继续做数据同步,再次观察任务同步状态:

[tidb@host0 config_file]$ tiup cdc cli changefeed list --server=http://192.168.31.201:8300
tiup is checking updates for component cdc ...
Starting component `cdc`: /home/tidb/.tiup/components/cdc/v7.2.0/cdc cli changefeed list --server=http://192.168.31.201:8300
[
  {
    "id": "simple-replication-task",
    "namespace": "default",
    "summary": {
      "state": "normal",
      "tso": 443077639481589762,
      "checkpoint": "2023-07-24 21:57:39.790",
      "error": null
    }
  }
]

[tidb@host0 config_file]$ tiup cdc cli changefeed query --server=http://192.168.31.201:8300 --changefeed-id="simple-replication-task"
tiup is checking updates for component cdc ...
Starting component `cdc`: /home/tidb/.tiup/components/cdc/v7.2.0/cdc cli changefeed query --server=http://192.168.31.201:8300 --changefeed-id=simple-replication-task
{
  "upstream_id": 7248279335184372145,
  "namespace": "default",
  "id": "simple-replication-task",
  "sink_uri": "mysql://root:xxxxx@192.168.31.201:4000/",
  "config": {
    "memory_quota": 1073741824,
    "case_sensitive": true,
    "enable_old_value": true,
    "force_replicate": false,
    "ignore_ineligible_table": true,
    "check_gc_safe_point": true,
    "enable_sync_point": false,
    "bdr_mode": false,
    "sync_point_interval": 600000000000,
    "sync_point_retention": 86400000000000,
    "filter": {
      "rules": [
        "*.*"
      ]
    },
    "mounter": {
      "worker_num": 16
    },
    "sink": {
      "delete_only_output_handle_key_columns": null,
      "large_message_only_handle_key_columns": null
    },
    "consistent": {
      "level": "none",
      "max_log_size": 64,
      "flush_interval": 2000,
      "use_file_backend": false
    },
    "scheduler": {
      "enable_table_across_nodes": false,
      "region_threshold": 100000,
      "write_key_threshold": 0
    },
    "integrity": {
      "integrity_check_level": "none",
      "corruption_handle_level": "warn"
    }
  },
  "create_time": "2023-07-24 21:54:29.203",
  "start_ts": 443077589504098309,
  "resolved_ts": 443077719488200706,
  "target_ts": 0,
  "checkpoint_tso": 443077639481589762,
  "checkpoint_time": "2023-07-24 21:57:39.790",
  "state": "normal",
  "creator_version": "v7.2.0",
  "task_status": [
    {
      "capture_id": "6a014853-67f3-4b97-a8bf-8e17569b2601",
      "table_ids": [
        176,
        205,
        207,
        211,
        301,
        166,
        174,
        213,
        164,
        203,
        217,
        170,
        215,
        172,
        209,
        162,
        168
      ]
    }
  ]
}

可以看到任务同步状态是正常的(altermanager也没有监控告警),但是实际上数据同步已中断,这是否是缺陷呢?

排除下环境问题,可以重启下,再看

你这是从7.1同步到6.5,而TiDB 6.6之前,以及MySQL中要求自增列必须包含在主键中,之后才允许自增列不作为主键或者索引中。可以看下官方文档:

https://docs.pingcap.com/zh/tidb/dev/auto-increment#mysql-兼容模式

不过changefeed状态不是failed确实也是个BUG

:thinking:提BUG,必须提BUG

tidb不支持非主键自增吧

好的,另外问下对于tidb集群->tidb集群的场景(比如同城双集群、数据库迁移升级等)。当备集群分别采用BR、dumpling+ lightning(logic模式)、lighting(local模式)初始化备集群后,然后用ticdc同步复制。那么对存在自增主键的表有什么影响呢?是否需要重置自增列的初始值(设置为当前值+cache,避免ticdc程序插入的值和自增值冲突)。