TiCDC 一直没有同步数据到下游mysql

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v4.0.6
  • 【问题描述】:
    TICDC 使用配置文件起同步任务到 下游mysql,mysql始终没有数据,日志的报错如下,我要同步的是 cleaner这个数据库的所有表,谢谢~
[tidb@r-tidb4cleaner-TiFlash-m220-238 conf]$ cat changefeed.toml

case-sensitive = true
enable-old-value = false

[filter]
#ignore-txn-start-ts = [1, 2]

rules = ['\*.\*', '!test.*']

[mounter]
worker-num = 16

[sink]
protocol = "default"

[cyclic-replication]
enable = false
replica-id = 1
filter-replica-ids = [2,3]
sync-ddl = true

#创建任务

../bin/cdc cli changefeed create --pd=http://10.105.130.102:2379 --sink-uri="mysql://cdc_dist:cdc_dist@10.110.9.218:6306/" --config changefeed.toml --changefeed-id="test-to-10-110-9-218-6"
Create changefeed successfully!
ID: test-to-10-110-9-218-6
Info: {"sink-uri":"mysql://cdc_dist:cdc_dist@10.110.9.218:6306/","opts":{},"create-time":"2020-09-21T14:52:07.727026086+08:00","start-ts":419606332104769537,"target-ts":0,"admin-job-type":0,"sort-engine":"memory","sort-dir":".","config":{"case-sensitive":true,"enable-old-value":false,"filter":{"rules":["*.*","!test.*"],"ignore-txn-start-ts":null,"ddl-allow-list":null},"mounter":{"worker-num":16},"sink":{"dispatchers":null,"protocol":"default"},"cyclic-replication":{"enable":false,"replica-id":1,"filter-replica-ids":[2,3],"id-buckets":0,"sync-ddl":true},"scheduler":{"type":"table-number","polling-time":-1}},"state":"normal","history":null,"error":null}

#查询任务

[tidb@r-tidb4cleaner-TiFlash-m220-238 conf]$ ../bin/cdc cli changefeed query -s  --pd=http://10.105.130.102:2379 -c test-to-10-110-9-218-6
{
 "state": "normal",
 "tso": 0,
 "checkpoint": "",
 "error": null
}

#查看日志一直报

[2020/09/21 14:41:07.399 +08:00] [WARN] [processor.go:916] ["Ignore existing table"] [ID=2678]
[2020/09/21 14:41:07.449 +08:00] [WARN] [processor.go:916] ["Ignore existing table"] [ID=2678]
[2020/09/21 14:41:07.499 +08:00] [WARN] [processor.go:916] ["Ignore existing table"] [ID=2678]
[2020/09/21 14:41:07.549 +08:00] [WARN] [processor.go:916] ["Ignore existing table"] [ID=2678]
[2020/09/21 14:41:07.599 +08:00] [WARN] [processor.go:916] ["Ignore existing table"] [ID=2678]
[2020/09/21 14:41:07.649 +08:00] [WARN] [processor.go:916] ["Ignore existing table"] [ID=2678]
[2020/09/21 14:41:07.655 +08:00] [WARN] [processor.go:916] ["Ignore existing table"] [ID=2678]

#详细信息如下

[tidb@r-tidb4cleaner-TiFlash-m220-238 conf]$ ../bin/cdc cli changefeed query   --pd=http://10.105.130.102:2379 -c test-to-10-110-9-218-6
{
  "info": {
    "sink-uri": "mysql://cdc_dist:cdc_dist@10.110.9.218:6306/",
    "opts": {},
    "create-time": "2020-09-21T14:55:55.47812296+08:00",
    "start-ts": 419606391808065537,
    "target-ts": 0,
    "admin-job-type": 0,
    "sort-engine": "memory",
    "sort-dir": ".",
    "config": {
      "case-sensitive": true,
      "enable-old-value": false,
      "filter": {
        "rules": [
          "*.*",
          "!test.*"
        ],
        "ignore-txn-start-ts": null,
        "ddl-allow-list": null
      },
      "mounter": {
        "worker-num": 16
      },
      "sink": {
        "dispatchers": null,
        "protocol": "default"
      },
      "cyclic-replication": {
        "enable": false,
        "replica-id": 1,
        "filter-replica-ids": [
          2,
          3
        ],
        "id-buckets": 0,
        "sync-ddl": true
      },
      "scheduler": {
        "type": "table-number",
        "polling-time": -1
      }
    },
    "state": "normal",
    "history": null,
    "error": null
  },
  "status": {
    "resolved-ts": 419606391808065537,
    "checkpoint-ts": 419606391808065537,
    "admin-job-type": 0
  },
  "count": 0,
  "task-status": [
    {
      "capture-id": "8e8ce1be-6133-449c-8621-107176cdb23c",
      "status": {
        "tables": {
          "2656": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2666": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2668": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2676": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2680": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2684": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2686": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          }
        },
        "operation": {
          "2656": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2666": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2668": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": false
          },
          "2676": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2680": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2684": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2686": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          }
        },
        "admin-job-type": 0
      }
    },
    {
      "capture-id": "fef76f98-d46d-4b44-ab60-4662361973d9",
      "status": {
        "tables": {
          "2658": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2662": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2664": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2670": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2672": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2674": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2678": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          },
          "2682": {
            "start-ts": 419606391808065537,
            "mark-table-id": 0
          }
        },
        "operation": {
          "2658": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2662": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2664": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2670": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2672": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2674": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          },
          "2678": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": false
          },
          "2682": {
            "delete": false,
            "boundary_ts": 419606391808065537,
            "done": true
          }
        },
        "admin-job-type": 0
      }
    }
  ]
}
1 个赞

下游mysql 不能是一个全新的空库的吗?

上游用 DM 同步数据进来 TIDB,TIDB能同时用 TiCDC 同步数据到 下游mysql吗?

需要将 db 先创建上,ticdc 只会从配置的 commit ts 开始同步数据。

mysql1 – DM – TiDB – TiCDC – mysql2 这样的逻辑吗?是可以的

好的,我正在尝试这样做,看上去正常,但同步的日志中有些error,这种报错有影响么

[2020/09/21 17:30:59.179 +08:00] [INFO] [client.go:838] [“EventFeed disconnected”] [regionID=52133] [requestID=1882] [span=“[748000000000000aff9c5f72800000000bffbbb96d0000000000fa, 748000000000000aff9c5f72800000000cff4517640000000000fa)”] [checkpoint=419608830612340737] [error="[CDC:ErrEventFeedEventError]epoch_not_match:<current_regions:<id:53841 start_key:"t\200\000\000\000\000\000\
\377\234_r\200\000\000\000\013\377\273\271m\000\000\000\000\000\372" end_key:"t\200\000\000\000\000\000\
\377\234_r\200\000\000\000\014\3771\216\031\000\000\000\000\000\372" region_epoch:<conf_ver:5 version:862 > peers:<id:53842 store_id:1 > peers:<id:53843 store_id:5 > peers:<id:53844 store_id:4 > > current_regions:<id:52133 start_key:"t\200\000\000\000\000\000\
\377\234_r\200\000\000\000\014\3771\216\031\000\000\000\000\000\372" end_key:"t\200\000\000\000\000\000\
\377\234_r\200\000\000\000\014\377E\027d\000\000\000\000\000\372" region_epoch:<conf_ver:5 version:862 > peers:<id:52134 store_id:1 > peers:<id:52135 store_id:5 > peers:<id:52136 store_id:4 > > > "]

应该是 Region 发生了 split 或者 schedule ,导致读取 Region leader 时候,请求失败了,这个默认会自动重试。

现在是这样了,自己停了同步,大概是什么原因?

[tidb@r-tidb4cleaner-TiFlash-m220-238 log]$ …/bin/cdc cli changefeed query -s --pd=http://10.105.130.102:2379 -c cdc-dm
{
“state”: “stopped”,
“tso”: 419610921813671937,
“checkpoint”: “2020-09-21 19:43:56.069”,
“error”: {
“addr”: “10.108.220.238:8300”,
“code”: “CDC:ErrProcessorUnknown”,
“message”: “[CDC:ErrBufferReachLimit]puller mem buffer reach size limit”
}

可以提供 cdc server 的日志,和 grafana 监控页changefeed 选项卡内的截图么

cdc-24h.log.gz (4.3 MB) changefeed.pdf (2.2 MB)

还需要提供什么信息吗?

  • 日志里频繁出现以下日志的原因,是因为这两张表的初始化没有完成。

    [2020/09/22 09:54:46.147 +08:00] [INFO] [changefeed.go:746] [“some operation is still unapplied”] …
    [2020/09/22 09:54:46.189 +08:00] [WARN] [processor.go:916] [“Ignore existing table”] [ID=2716]
    [2020/09/22 09:54:46.189 +08:00] [WARN] [processor.go:916] [“Ignore existing table”] [ID=2718]

  • 每张表初始化完成会有以下的日志

    [2020/09/22 09:53:45.862 +08:00] [INFO] [puller.go:238] [“puller is initialized”] [duration=421.557801ms] [changefeedid=cdc-dm] [tableID=2710] [spans="["[748000000000000aff965f000000000000f9, 748000000000000aff9660000000000000f9)"]"] [resolvedTs=419610921813671937]

  • 这两张表初始化慢的原因应该是数据量比较大,譬如这个时间点开始的同步任务 checkpoint ts = 419610921813671937 = 2020-09-21 19:43:56.069 +0800 CST,数据已经落后比较多了。

    [2020/09/21 23:03:12.581 +08:00] [INFO] [owner.go:197] [“Find new changefeed”] [info="{“sink-uri”:"**",“opts”:{},“create-time”:“2020-09-21T16:05:50.016895904+08:00”,“start-ts”:419607491371073537,“target-ts”:0,“admin-job-type”:0,“sort-engine”:“memory”,“sort-dir”:".",“config”:{“case-sensitive”:true,“enable-old-value”:false,“filter”:{“rules”:["cleaner."],“ignore-txn-start-ts”:null,“ddl-allow-list”:null},“mounter”:{“worker-num”:16},“sink”:{“dispatchers”:null,“protocol”:“default”},“cyclic-replication”:{“enable”:false,“replica-id”:1,“filter-replica-ids”:[2,3],“id-buckets”:0,“sync-ddl”:true},“scheduler”:{“type”:“table-number”,“polling-time”:-1}},“state”:“normal”,“history”:null,“error”:null}"] [id=cdc-dm] [“checkpoint ts”=419610921813671937]

可以查看 table-id = 2716,2718 这两张表落后的数据量

  • 如果数据量在 40 G 以内(测试值),可以尝试通过 文件排序 恢复任务
  • 如果累积数据量已经比较大了,可以选择先忽略这两张表恢复同步,这两张表需要重做数据,再进行同步
  • 恢复新的同步任务后可以观察一下下游的同步速度是否跟得上

PS:对于落后比较大数据量恢复同步这个问题我们正在优化解决中。

这两张表上游mysql分别是6亿多行,使用的是以下模型:

源端mysql1 – DM – TiDB – TiCDC – 目的端mysql2

数据写入到TiDB是 DM(先全量,再增量)同步进去的,是不是先全量同步这样用,TiCDC会跟不上?

是的,ticdc 对于全量场景支持还不完善(包括不支持 Lightning 非 tidb backend 导入的数据和 br 导入的数据;差异数据量比较大之后不能正常同步)
,建议先使用 dumpling+Lightning 或 BR 这些数据导入导出工具做全量之后,再开启 ticdc 同步增量数据

下面这种情形是否可行?

数据源mysql(数据有更新操作) - > DM -> TiDB 已经在实时同步后,dumpling/mydumper TiDB导出完整数据, 再用Lightning 导入到后端MySQL,最后开启TiCDC的同步任务。

这时候,TiCDC会从后端MySQL已有数据(Mydumper的数据)基础上 开始从TiDB同步增量数据吗?

可以,需要保证 TiCDC 同步开始时落后的数据量不能太大(单表10G以内可以)

现在dumping完后,lightning到下游的mysql 报如下错误:

[2020/09/24 10:56:10.686 +08:00] [ERROR] [main.go:84] [“tidb lightning encountered error”] [error=“Error 1193: Unknown system variable ‘tidb_build_stats_concurrency’”]

配置如下:

[tidb@r-tidb4cleaner-m130-102 lightning]$ cat tidb-lightning.toml
[lightning]

转换数据的并发数,默认为逻辑 CPU 数量,不需要配置。
混合部署的情况下可以配置为逻辑 CPU 的 75% 大小。
region-concurrency = 32

日志
level = “info”
file = “tidb-lightning.log”

[tikv-importer]
backend 设置为 local 模式
backend = “tidb”
设置本地临时存储路径
sorted-kv-dir = “/mnt/sorted-kv-dir”

[mydumper]
Mydumper 源数据目录。
data-source-dir = “/mnt/dumping”

[tidb]
目标集群的信息。tidb-server 的监听地址,填一个即可。
host = “10.110.9.218”
port = 6306
user = “”
password = “”
表架构信息在从 TiDB 的“状态端口”获取。
status-port = 10080

lightning 不支持将数据导入到 mysql,dumpling 下来的数据,根据导出的数据格式,可以使用 loader 或者其他工具导入。
image

先用 loader 导入下吧,我们提个 issue,后面贴一下,做下兼容性的修改。

目前是用 dumpling+loader 导入数据发生错误,Try adjusting the ‘max_allowed_packet’ variable on the server

2020/09/27 13:07:40 db.go:123: [warning] [exec][sql][USE cleaner; INSERT INTO cleaner_batch VALUES(1,‘163.wjl.pre.8577.1.batch-201909031732’,'4b21d87d06ff4710f93ad4 UPDATE tidb_loader.checkpoint SET offset=108267197 WHERE id =‘ac258c’ AND filename='cleane][error]packet for query is too large. Try adjusting the ‘max_allowed_packet’ variable on the server
2020/09/27 13:07:40 loader.go:124: [fatal] packet for query is too large. Try adjusting the ‘max_allowed_packet’ variable on the server

服务端,也是目的端,已经调整到 最大的1G了

(root@10.110.41.188:6306) [cleaner]>show variables like ‘%max_allowed_packet%’;
±-------------------------±-----------+
| Variable_name | Value |
±-------------------------±-----------+
| max_allowed_packet | 1073741824 |
| slave_max_allowed_packet | 1073741824 |
±-------------------------±-----------+