DM 2.0.1 数据同步失败,dump过程失败

,

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:4.0.9
  • 【问题描述】:数据同步失败,dump过程失败,具体dm-worker错误日志如下

[2020/12/27 21:43:18.075 +08:00] [ERROR] [log.go:87] [“scanning from sql.Row failed”] [task=bolai] [unit=dump] [error=“context canceled”] [errorVerbose=“context canceled\ngithub.com/pingcap/errors.AddStack\ \t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ github.com/pingcap/errors.Trace\ \t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\ github.com/pingcap/dumpling/v4/export.decodeFromRows\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/ir.go:67\ github.com/pingcap/dumpling/v4/export.(*rowIter).Decode\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/ir_impl.go:37\ngithub.com/pingcap/dumpling/v4/export.WriteInsert\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer_util.go:201\ github.com/pingcap/dumpling/v4/export.FileFormat.WriteInsert\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer_util.go:591\ github.com/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:204\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:190\ngithub.com/pingcap/br/pkg/utils.WithRetry\ \t/go/pkg/mod/github.com/pingcap/br@v5.0.0-rc.0.20201223100334-c344d1edf20c+incompatible/pkg/utils/retry.go:34\ github.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:161\ngithub.com/pingcap/dumpling/v4/export.(*Writer).handleTask\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:104\ngithub.com/pingcap/dumpling/v4/export.(*Writer).run\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:86\ngithub.com/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\ \t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/dump.go:236\ngolang.org/x/sync/errgroup.(*Group).Go.func1\ \t/go/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357”]

Hello ~ 根据报错看,是创建同步任务,在初始化导出数据报错。报错的 task 名称是: bolai ,可以尝试进一步排查该问题:

  1. 确认一下 DM 的版本是否为最新的版本,或者提供具体的 DM 版本;
  2. 可以根据 task 找到对应的 dm-worker 看一下报错的上下问有没有其他的 waring 和 error 报错也一起发一下;
  3. dm-worker 会记录具体的 dumpling 的命令,尝试通过 dumpling 命令在上游尝试导出命令,看是否相同的报错。

谢谢回复

  1. DM版本为v2.0.1
  2. 暂时没发现可疑的warning或error
  3. 没看懂,暂时没找到dumpling命令,求详情

从报错看,dumpling 按照 SQL.row 格式扫描数据失败,可以按照 dm 配置同步需求,通过 dumpling 命令尝试导出一下,看看能不能正常导出。如果今晚想要上线这个同步任务,可以将 DM full 模式调整 increment 模式,全量部分通过手工导出导入方式完成,使用 dumpling 和 lightning 工具。该问题需要明天工作时间,找相关同学确认一下。

感谢,今晚刚开始导入,不需要今晚上线,等明天确认具体问题吧,感谢

[2020/12/27 23:14:24.654 +08:00] [ERROR] [subtask.go:310] [“unit process error”] [subtask=bolai] [unit=Dump] [“error information”="{“ErrCode”:32001,“ErrClass”:“dump-unit”,“ErrScope”:“internal”,“ErrLevel”:“high”,“Message”:"mydumper/dumpling runs with error, with output (may empty): “,“RawCause”:“Error 3024: Query execution was interrupted, maximum statement execution time exceeded”}”]

发现了这个错,同时发现上游数据库很多慢查询,发现很多语句select出整张表了,怎么会这样?

请问一下

1 上游是MySQL还是tidb,上游表的主键是否有分布不均的情况
2 dm使用的任务配置有何更改吗

  1. 上游是MySQL 5.7,主键是自增id
  2. 没有特别更改,三个月前测试过,没问题,近期打算正式用

看到慢日志是这样的:
SELECT …
FROM …
ORDER BY id
没有分页

另外,dm v2.0.1 的压缩包,加压后mydumper版本是0.9.5,不是dumpling,但是我看release notes说已经将mydumper替换成dumping了,奇怪

mydumper包含在压缩包里,但是已经弃用了。

明天我们会联系研发看一下。可以在页面的“联系社区专家”提供您的联系方式吗

联系方式刚才提交过了

麻烦提供一下 dumpling 配置项,dumpling 的 log level 调整为 debug 模式,提供一下 log 日志,我们分析一下,感谢 ~

mydumpers:
  global:
    threads: 4                      
    chunk-filesize: 64               
    skip-tz-utc: true                
    extra-args: "--consistency none"

部分日志如下:

[2020/12/27 23:06:35.793 +08:00] [DEBUG] [log.go:92] ["tear down lazy file writer..."] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.action.000000027.sql]
[2020/12/27 23:06:35.804 +08:00] [DEBUG] [log.go:92] ["dumping table"] [task=BL] [unit=dump] [table=clearing_order] ["record counts"=88096]
[2020/12/27 23:06:35.804 +08:00] [DEBUG] [log.go:92] ["tear down lazy file writer..."] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.clearing_order.000000023.sql]
[2020/12/27 23:06:36.065 +08:00] [DEBUG] [log.go:92] ["opened file"] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.action.000000028.sql]
[2020/12/27 23:06:36.133 +08:00] [DEBUG] [server.go:209] ["etcd member list doesn't change"] ["client URLs"="[http://172.16.115.71:8261]"]
[2020/12/27 23:06:36.136 +08:00] [DEBUG] [log.go:92] ["opened file"] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.clearing_order.000000024.sql]
[2020/12/27 23:06:36.153 +08:00] [DEBUG] [log.go:92] ["dumping table"] [task=BL] [unit=dump] [table=action] ["record counts"=11442]
[2020/12/27 23:06:36.153 +08:00] [DEBUG] [log.go:92] ["tear down lazy file writer..."] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.action.000000028.sql]
[2020/12/27 23:06:36.193 +08:00] [DEBUG] [log.go:92] ["dumping table"] [task=BL] [unit=dump] [table=clearing_trade_item] ["record counts"=309757]
[2020/12/27 23:06:36.193 +08:00] [ERROR] [log.go:87] ["scanning from sql.Row failed"] [task=BL] [unit=dump] [error="context canceled"] [errorVerbose="context canceled\
github.com/pingcap/errors.AddStack\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\
github.com/pingcap/errors.Trace\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/juju_adaptor.go:15\
github.com/pingcap/dumpling/v4/export.decodeFromRows\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/ir.go:67\
github.com/pingcap/dumpling/v4/export.(*rowIter).Decode\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/ir_impl.go:37\
github.com/pingcap/dumpling/v4/export.WriteInsert\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer_util.go:201\
github.com/pingcap/dumpling/v4/export.FileFormat.WriteInsert\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer_util.go:591\
github.com/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:204\
github.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:190\
github.com/pingcap/br/pkg/utils.WithRetry\
\t/go/pkg/mod/github.com/pingcap/br@v5.0.0-rc.0.20201223100334-c344d1edf20c+incompatible/pkg/utils/retry.go:34\
github.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:161\
github.com/pingcap/dumpling/v4/export.(*Writer).handleTask\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:104\
github.com/pingcap/dumpling/v4/export.(*Writer).run\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/writer.go:86\
github.com/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\
\t/go/pkg/mod/github.com/pingcap/dumpling@v0.0.0-20201224084933-34903c7475cf/v4/export/dump.go:236\
golang.org/x/sync/errgroup.(*Group).Go.func1\
\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"]
[2020/12/27 23:06:36.193 +08:00] [DEBUG] [log.go:92] ["tear down lazy file writer..."] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.device_power_record.000000014.sql]
[2020/12/27 23:06:36.193 +08:00] [DEBUG] [log.go:92] ["dumping table"] [task=BL] [unit=dump] [table=clearing_order] ["record counts"=12223]
[2020/12/27 23:06:36.193 +08:00] [DEBUG] [log.go:92] ["tear down lazy file writer..."] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.clearing_order.000000024.sql]
[2020/12/27 23:06:36.193 +08:00] [DEBUG] [log.go:92] ["tear down lazy file writer..."] [task=BL] [unit=dump] [path=file:/nas-log/tidb-dm-dump/dumped_data.BL/blp.clearing_trade_item.000000012.sql]
[2020/12/27 23:06:36.375 +08:00] [ERROR] [dumpling.go:137] ["dump data exits with error"] [task=BL] [unit=dump] ["cost time"=1m3.730767198s] [error="ErrCode:32001 ErrClass:\"dump-unit\" ErrScope:\"internal\" ErrLevel:\"high\" Message:\"mydumper/dumpling runs with error, with output (may empty): \" RawCause:\"Error 3024: Query execution was interrupted, maximum statement execution time exceeded\" "]
[2020/12/27 23:06:36.375 +08:00] [INFO] [subtask.go:291] ["unit process returned"] [subtask=BL] [unit=Dump] [stage=Paused] [status={}]
[2020/12/27 23:06:36.375 +08:00] [ERROR] [subtask.go:310] ["unit process error"] [subtask=BL] [unit=Dump] ["error information"="{\"ErrCode\":32001,\"ErrClass\":\"dump-unit\",\"ErrScope\":\"internal\",\"ErrLevel\":\"high\",\"Message\":\"mydumper/dumpling runs with error, with output (may empty): \",\"RawCause\":\"Error 3024: Query execution was interrupted, maximum statement execution time exceeded\"}"]

没人跟进解决吗?

为什么会出现dumpling过程出现语句是select全部而不分页的?

https://github.com/pingcap/dumpling/blob/34903c7475cfbfc715c03e0a4e5029c640f33234/v4/export/sql.go#L189

看到源头了,为什么会这么写?

https://docs.pingcap.com/zh/tidb/dev/dumpling-overview#dumpling-主要选项表
需要在 extra-args 里配置 “-r 200000” 或者配置 mydumpers.global.rows: 200000,dumpling 会根据 -r 参数划分 chunk,开启表内并发,生成分 chunk select 的语句。

谢谢,我试试,看到过了,我以为默认就应该分页,没想到真要这么用

好的。这里我们后续看看怎么优化一下