【环境说明】
TiDB: v5.4.1
DM: v5.4.1
同步任务配置:
【问题说明】
DM在全量dump数据时异常停止,执行resume-task后已经导出的本地数据被清理,但是通过query-status查看导出情况,发现进度仍然是之前的进度。
具体如下:
异常退出
[2022/05/26 13:49:48.124 +08:00] [INFO] [status.go:31] [progress] [task=dm-rds_master] [unit=dump] [tables=“46/168 (27.4%)”] [“finished rows”=1426395870] [“estimate total rows”=1387484471] [“finish ed size”=666GB] [“average speed(MiB/s)”=72.45181706066245]
[2022/05/26 13:51:48.124 +08:00] [INFO] [status.go:31] [progress] [task=dm-rds_master] [unit=dump] [tables=“46/168 (27.4%)”] [“finished rows”=1449285519] [“estimate total rows”=1387484471] [“finish ed size”=675GB] [“average speed(MiB/s)”=71.97121670267033]
[2022/05/26 13:52:50.136 +08:00] [INFO] [collector.go:203] [“backup failed summary”] [task=dm-rds_master] [unit=dump] [total-ranges=0] [ranges-succeed=0] [ranges-failed=0]
[2022/05/26 13:52:50.155 +08:00] [INFO] [writer.go:77] [“context has been done, the writer will exit”] [task=dm-rds_master] [unit=dump] [“writer ID”=0]
[2022/05/26 13:52:50.162 +08:00] [ERROR] [dumpling.go:152] [“dump data exits with error”] [task=dm-rds_master] [unit=dump] [“cost time”=1h25m26.411221266s] [error=“ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"sql: SHOW COLUMNS FROMmarketing_center
.mc_activity_record_32
: driver: bad connection" “]
[2022/05/26 13:52:50.163 +08:00] [INFO] [subtask.go:310] [“unit process returned”] [subtask=dm-rds_master] [unit=Dump] [stage=Paused] [status=”{"totalTables":168,"completedTables":46,"finishedBytes":679914782035,"finishedRows":1461745697,"estimateTotalRows":1387484471}”]
[2022/05/26 13:52:50.163 +08:00] [ERROR] [subtask.go:331] [“unit process error”] [subtask=dm-rds_master] [unit=Dump] [“error information”="ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"sql: SHOW COLUMNS FROMmarketing_center
.mc_activity_record_32
: driver: bad connection" "]
[2022/05/26 13:52:50.163 +08:00] [INFO] [subtask.go:333] [paused] [subtask=dm-rds_master] [unit=Dump]
[2022/05/26 13:52:50.714 +08:00] [WARN] [task_checker.go:383] [“task can’t auto resume”] [component=“task checker”] [task=dm-rds_master] [“paused duration”=1h55m54.999266586s]
[2022/05/26 13:52:55.709 +08:00] [WARN] [task_checker.go:383] [“task can’t auto resume”] [component=“task checker”] [task=dm-rds_master] [“paused duration”=1h55m59.994170723s]
[2022/05/26 13:53:00.710 +08:00] [WARN] [task_checker.go:383] [“task can’t auto resume”] [component=“task checker”] [task=dm-rds_master] [“paused duration”=1h56m4.994740739s]
查看状态:query-status dm-rds_master
tiup is checking updates for component dmctl …
Starting componentdmctl
: /home/admin/.tiup/components/dmctl/v6.0.0/dmctl/dmctl /home/admin/.tiup/components/dmctl/v6.0.0/dmctl/dmctl --master-addr=172.19.58.205:8261 query-status dm-rds_master
{
“result”: true,
“msg”: “”,
“sources”: [
{
“result”: true,
“msg”: “”,
“sourceStatus”: {
“source”: “ds-rds_master”,
“worker”: “dm-xxxx-9262”,
“result”: null,
“relayStatus”: null
},
“subTaskStatus”: [
{
“name”: “dm-rds_master”,
“stage”: “Paused”,
“unit”: “Dump”,
“result”: {
“isCanceled”: false,
“errors”: [
{
“ErrCode”: 32001,
“ErrClass”: “dump-unit”,
“ErrScope”: “internal”,
“ErrLevel”: “high”,
“Message”: "mydumper/dumpling runs with error, with output (may empty): ",
“RawCause”: “sql: SHOW COLUMNS FROMmarketing_center
.mc_activity_record_32
: driver: bad connection”,
“Workaround”: “”
}
],
“detail”: null
},
“unresolvedDDLLockID”: “”,
“dump”: {
“totalTables”: “168”,
“completedTables”: 76,
“finishedBytes”: 780307764962,
“finishedRows”: 1587939832,
“estimateTotalRows”: 2104326425
}
}
]
}
]
}
执行resume-task dm-rds_master,并查看状态
tiup is checking updates for component dmctl …
Starting componentdmctl
: /home/admin/.tiup/components/dmctl/v6.0.0/dmctl/dmctl /home/admin/.tiup/components/dmctl/v6.0.0/dmctl/dmctl --master-addr=172.19.58.205:8261 query-status dm-rds_master
{
“result”: true,
“msg”: “”,
“sources”: [
{
“result”: true,
“msg”: “”,
“sourceStatus”: {
“source”: “ds-rds_master”,
“worker”: “dm-xxx-9262”,
“result”: null,
“relayStatus”: null
},
“subTaskStatus”: [
{
“name”: “dm-rds_master”,
“stage”: “Running”,
“unit”: “Dump”,
“result”: null,
“unresolvedDDLLockID”: “”,
“dump”: {
“totalTables”: “168”,
“completedTables”: 76,
“finishedBytes”: 780438448955,
“finishedRows”: 1589111472,
“estimateTotalRows”: 2106467643
}
}
]
}
]
}
查看磁盘使用
[admin@dm-cluster01 dm-worker-9262]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vdb1 1.5T 5.5G 1.4T 1% /ccdata
从上面的报错日志上可以知道,已经导出675G,但是磁盘只使用了5.5G,被清空了;重新恢复后,进度也是6/7百G:
“dump”: {
“totalTables”: “168”,
“completedTables”: 76,
“finishedBytes”: 780307764962,
“finishedRows”: 1587939832,
“estimateTotalRows”: 2104326425
}