dm配置全量同步MySQL5.7的数据到tidb集群,dump过程到50%左右的时候出现 invalid connection报错,求解

【 TiDB 使用环境】生产环境
【 TiDB 版本】 7.1.10
【复现路径】做过哪些操作出现的问题
【遇到的问题:
dm 配置全量同步 MySQL-5.7.32-log 版本的数据到 TIDB 7.1.0集群
因为源头是阿里云RDS ,同步的库的数据大小将近1.8TB

报错提示 invalid connection,但是没有具体的报错SQL,我看了源头MySQL的slave库资源使用也都在合理的范围(我是直接配置从 slave库进行数据同步)

具体报错时任务的信息如下

{
“result”: true,
“msg”: “”,
“sources”: [
{
“result”: true,
“msg”: “”,
“sourceStatus”: {
“source”: “rds-product_a1_slave”,
“worker”: “dm-192.168.1.24-8262”,
“result”: null,
“relayStatus”: {
“masterBinlog”: “(mysql-bin.012998, 467060002)”,
“masterBinlogGtid”: “1084ef92-9c21-11ed-8357-b8cef6c11980:1-611757482,399fc427-0550-11ec-9dbb-b8599ff21ca6:1-501266838,9a2173c4-1033-11ec-81d4-506b4b3863ee:1-3969981405,eba0744b-8ce4-11ed-a786-b8cef6c0fa40:1-4328917899”,
“relaySubDir”: “2c81a2ea-dfe2-11ee-b930-043f72e56c5a.000001”,
“relayBinlog”: “(mysql-bin.012998, 467110976)”,
“relayBinlogGtid”: “1084ef92-9c21-11ed-8357-b8cef6c11980:1-611757484,399fc427-0550-11ec-9dbb-b8599ff21ca6:1-501266838,9a2173c4-1033-11ec-81d4-506b4b3863ee:1-3969981405,eba0744b-8ce4-11ed-a786-b8cef6c0fa40:1-4328917899”,
“relayCatchUpMaster”: false,
“stage”: “Running”,
“result”: null
}
},
“subTaskStatus”: [
{
“name”: “rds-to-tidb-for-product_a1”,
“stage”: “Paused”,
“unit”: “Dump”,
“result”: {
“isCanceled”: false,
“errors”: [
{
“ErrCode”: 32001,
“ErrClass”: “dump-unit”,
“ErrScope”: “internal”,
“ErrLevel”: “high”,
“Message”: "mydumper/dumpling runs with error, with output (may empty): ",
“RawCause”: “invalid connection”,
“Workaround”: “”
}
],
“detail”: null
},
“unresolvedDDLLockID”: “”,
“dump”: {
“totalTables”: “1844”,
“completedTables”: 936,
“finishedBytes”: 534994389197,
“finishedRows”: 373565904,
“estimateTotalRows”: 517781666,
“bps”: “116741901”,
“progress”: “63.54 %”
},
“validation”: null
}
]
}
]
}

日志中只能看到

[2024/08/26 17:57:22.490 +08:00] [WARN] [writer_util.go:194] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=rds-to-tidb-for-product_a1] [unit=dump] [database=product_a1] [table=itemSnapshot_10012] [“finished rows”=6145] [“finished size”=33578671] [error=“invalid connection”]
[2024/08/26 17:57:22.490 +08:00] [WARN] [writer_util.go:194] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=rds-to-tidb-for-product_a1] [unit=dump] [database=product_a1] [table=itemSnapshot_10037] [“finished rows”=891] [“finished size”=4334101] [error=“context canceled”]
[2024/08/26 17:57:22.491 +08:00] [WARN] [writer_util.go:194] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=rds-to-tidb-for-product_a1] [unit=dump] [database=product_a1] [table=itemSnapshot_10035] [“finished rows”=9991] [“finished size”=55302150] [error=“context canceled”]
[2024/08/26 17:57:22.491 +08:00] [WARN] [writer_util.go:194] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=rds-to-tidb-for-product_a1] [unit=dump] [database=product_a1] [table=itemPic_9] [“finished rows”=17020] [“finished size”=30240289] [error=“context canceled”]
[2024/08/26 17:57:22.492 +08:00] [INFO] [collector.go:220] [“units canceled”] [cancel-unit=0]

[2024/08/26 17:57:22.497 +08:00] [INFO] [collector.go:221] [“backup failed summary”] [task=rds-to-tidb-for-product_a1] [unit=dump] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [unit-name=“dump table data”] [error=“invalid connection”] [errorVerbose=“invalid connection\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/errors.go:174\ngithub.com/pingcap/errors.Trace\n\tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/juju_adaptor.go:15\ngithub.com/pingcap/tidb/dumpling/export.(*rowIter).Error\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/ir_impl.go:42\ngithub.com/pingcap/tidb/dumpling/export.WriteInsert\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/writer_util.go:285\ngithub.com/pingcap/tidb/dumpling/export.FileFormat.WriteInsert\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/writer_util.go:660\ngithub.com/pingcap/tidb/dumpling/export.(*Writer).tryToWriteTableData\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/writer.go:243\ngithub.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData.func1\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/writer.go:228\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/br/pkg/utils/retry.go:53\ngithub.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/writer.go:192\ngithub.com/pingcap/tidb/dumpling/export.(*Writer).handleTask\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/writer.go:115\ngithub.com/pingcap/tidb/dumpling/export.(*Writer).run\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/writer.go:93\ngithub.com/pingcap/tidb/dumpling/export.(*Dumper).startWriters.func4\n\tgithub.com/pingcap/tidb@v1.1.0-beta.0.20230420065519-eb77d3928398/dumpling/export/dump.go:376\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1598”]

[2024/08/26 17:57:22.499 +08:00] [ERROR] [dumpling.go:214] [“dump data exits with error”] [task=rds-to-tidb-for-product_a1] [unit=dump] [“cost time”=1h15m13.433564019s] [error=“ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"invalid connection" “]
[2024/08/26 17:57:22.499 +08:00] [INFO] [dumpling.go:299] [“progress status of dumpling”] [task=rds-to-tidb-for-product_a1] [unit=dump] [total_tables=1844] [finished_tables=936] [estimated_total_rows=517781666] [finished_rows=373565904] [estimated_progress=“72.15 %”] [“new progress”=“63.54 %”] [bps=116741901]
[2024/08/26 17:57:22.499 +08:00] [INFO] [subtask.go:333] [“unit process returned”] [subtask=rds-to-tidb-for-product_a1] [unit=Dump] [stage=Paused] [status=”{"totalTables":1844,"completedTables":936,"finishedBytes":534994389197,"finishedRows":373565904,"estimateTotalRows":517781666,"bps":116741901,"progress":"63.54 %"}”]
[2024/08/26 17:57:22.499 +08:00] [ERROR] [subtask.go:354] [“unit process error”] [subtask=rds-to-tidb-for-product_a1] [unit=Dump] [“error information”="ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"invalid connection" "]
[2024/08/26 17:57:22.499 +08:00] [INFO] [subtask.go:356] [paused] [subtask=rds-to-tidb-for-product_a1] [unit=Dump]

stop-task之后重跑任务还是这样,只不过推进的进度不一样。

大家都遇到过这个问题么?

手工用 dumpling 备份有类似的现象吗?有没有可能是连接时长的防火墙的问题。

1.8T有点大吧,一般不建议这种数据量还用dm的全量迁移,建议先用dumpling+lightning同步全量数据,然后用dm同步增量数据。

目前测试应该是连接到了从库的问题,使用dumpling 手动同步出现报错,但是报错SQL单独在MySQL执行没有问题,配置成从 主库同步,使用 dumpling 没有问题。

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。