love-cat
(Love Cat)
1
【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】5.2.2
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】通过dm 同步数据报错
【资源配置】
【附件:截图/日志/监控】
[2023/06/02 09:50:08.422 +00:00] [ERROR] [subtask.go:311] [“unit process error”] [subtask=thc_1084_hazxy721] [unit=Dump] [“error information”=“{"ErrCode":32001,"ErrClass":"dump-unit","ErrScope":"internal","ErrLevel":"high","Message":"mydumper/dumpling runs with error, with output (may empty): ","RawCause":"invalid connection"}”]
2023/6/2 17:50:10
xfworld
(魔幻之翼)
2
看起来是连接问题,DM 为什么会接收到
mydumper/dumpling runs with error, with output (may empty)
这种错误?
love-cat
(Love Cat)
3
是在网上查也是说连问题,但是在DM节点,连接主数据库,还是连接tidb(从库)都连接问题,日志也找不到别的问题只是上面报错
是不是执行了一段时间后才报这个异常?是的话试下调大max-allowed-packet参数
dba-kit
(张天师)
6
这个需要在MySQL上调大一下wait_timeout
参数
love-cat
(Love Cat)
8
tail -100 dm-worker_stderr.log
[mysql] 2023/06/08 18:22:24 packets.go:73: unexpected EOF
[mysql] 2023/06/08 18:22:24 packets.go:428: busy buffer
love-cat
(Love Cat)
10
调整到128M,还是不行。set global max_allowed_packet = 134217728
love-cat
(Love Cat)
11
[2023/06/08 11:21:46.988 +00:00] [INFO] [collector.go:194] [“backup failed summary”] [task=thc_1084_hazxy721] [unit=dump] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [unit-name=“dump table data”] [error=“invalid connection”] [errorVerbose=“invalid connection\n
github.com/pingcap/errors.AddStack
/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/errors.go:174
github.com/pingcap/errors.Trace\n\t
/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/juju_adaptor.go:15\n
github.com/pingcap/dumpling/v4/export.(*rowIter).Error\n\t
/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/ir_impl.go:42
github.com/pingcap/dumpling/v4/export.WriteInsert\n\t
/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:271\n
github.com/pingcap/dumpling/v4/export.FileFormat.WriteInsert\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:623\ngithub.com/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:204\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:189\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/nfs/cache/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20210914112841-6ebfe8aa4257/br/pkg/utils/retry.go:47\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:160\ngithub.com/pingcap/dumpling/v4/export.(*Writer).handleTask\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:103\ngithub.com/pingcap/dumpling/v4/export.(*Writer).run\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writ
er.go:85\ngithub.com/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/dump.go:281\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/nfs/cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371”]
[2023/06/08 11:21:46.988 +00:00] [ERROR] [dumpling.go:142] [“dump data exits with error”] [task=thc_1084_hazxy721] [unit=dump] [“cost time”=1m40.122657856s] [error="ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"invalid connection" “]
[2023/06/08 11:21:46.988 +00:00] [INFO] [subtask.go:292] [“unit process returned”] [subtask=thc_1084_hazxy721] [unit=Dump] [stage=Paused] [status={}]
[2023/06/08 11:21:46.988 +00:00] [ERROR] [subtask.go:311] [“unit process error”] [subtask=thc_1084_hazxy721] [unit=Dump] [“error information”=”{"ErrCode":32001,"ErrClass":"dump-unit","ErrScope":"internal","ErrLevel":"high","Message":"mydumper/dumpling runs with error, with output (may empty): ","RawCause":"invalid connection"}
love-cat
(Love Cat)
15
dm-work max-allow-packet调整很大了,可是有的环境还是报错,日志看不太懂,哪个大神,帮忙分析下:
[writer_util.go:181] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=thc_1093_testzmyl1a] [unit=dump] [database=thc_1093_testzmyl1a] [table=cpoe_advice_fees] [“finished rows”=16093] [“finished size”=11310126] [error=“invalid connection”]
[2023/06/09 12:43:53.979 +08:00] [WARN] [writer_util.go:181] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=thc_1093_testzmyl1a] [unit=dump] [database=thc_1093_testzmyl1a] [table=cpoe_advice_fees_history] [“finished rows”=37251] [“finished size”=27536266] [error=“context canceled”]
[2023/06/09 12:43:53.979 +08:00] [WARN] [writer_util.go:181] [“fail to dumping table(chunk), will revert some metrics and start a retry if possible”] [task=thc_1093_testzmyl1a] [unit=dump] [database=thc_1093_testzmyl1a] [table=cpoe_medical_technology_record] [“finished rows”=39871] [“finished size”=13062228] [error=“context canceled”]
[2023/06/09 12:43:53.979 +08:00] [INFO] [collector.go:194] [“backup failed summary”] [task=thc_1093_testzmyl1a] [unit=dump] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [unit-name=“dump table data”] [error=“invalid connection”] [errorVerbose=“invalid connection\ngithub.com/pingcap/errors.AddStack\n\t/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/nfs/cache/mod/github.com/pingcap/errors@v0.11.5-0.20210513014640-40f9a1999b3b/juju_adaptor.go:15\ngithub.com/pingcap/dumpling/v4/export.(*rowIter).Error\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/ir_impl.go:42\ngithub.com/pingcap/dumpling/v4/export.WriteInsert\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:271\ngithub.com/pingcap/dumpling/v4/export.FileFormat.WriteInsert\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer_util.go:623\ngithub.com/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:204\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:189\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/nfs/cache/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20210914112841-6ebfe8aa4257/br/pkg/utils/retry.go:47\ngithub.com/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:160\ngithub.com/pingcap/dumpling/v4/export.(*Writer).handleTask\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:103\ngithub.com/pingcap/dumpling/v4/export.(*Writer).run\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/writer.go:85\ngithub.com/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\t/nfs/cache/mod/github.com/pingcap/dumpling@v0.0.0-20210914144241-99aca9186bc8/v4/export/dump.go:281\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/nfs/cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371”]
[2023/06/09 12:43:53.979 +08:00] [ERROR] [dumpling.go:142] [“dump data exits with error”] [task=thc_1093_testzmyl1a] [unit=dump] [“cost time”=1m51.408481533s] [error=“ErrCode:32001 ErrClass:"dump-unit" ErrScope:"internal" ErrLevel:"high" Message:"mydumper/dumpling runs with error, with output (may empty): " RawCause:"invalid connection" “]
[2023/06/09 12:43:53.979 +08:00] [INFO] [subtask.go:292] [“unit process returned”] [subtask=thc_1093_testzmyl1a] [unit=Dump] [stage=Paused] [status={}]
[2023/06/09 12:43:53.979 +08:00] [ERROR] [subtask.go:311] [“unit process error”] [subtask=thc_1093_testzmyl1a] [unit=Dump] [“error information”=”{"ErrCode":32001,"ErrClass":"dump-unit","ErrScope":"internal","ErrLevel":"high","Message":"mydumper/dumpling runs with error, with output (may empty): ","RawCause":"invalid connection"}”]
要是数据量确实很大,看看能不能减少单个任务同步的库
1 个赞
kkpeter
(Upstream889)
17
看日志报错是dump 阶段, 要看源库,调整下并发和行数?
love-cat
(Love Cat)
19
改了dumper参数还是不行,extra-args: "–consistency none"这个参数不知道怎么改,该行数直接用-r
需要修改MySQL参数吗?
有猫万事足
20
我觉得你可以控制一下每个任务同步的库/表的数量,也好定位是一下是那个库/表dump的时候出错了。
@Hacker007 上面的建议是个好建议。
直接任务配置里面白名单,类似下面这样挨个写:
block-allow-list:
balist-01:
do-dbs:
- “db1”
do-tables:
- db-name: “db1”
tbl-name: “tab1”
而且这样在后期维护dm运行的时候也不至于某个表同步出了问题,卡住整个库的同步。分了多个任务来同步,某个表同步出了问题,也只会卡住一部分表。
另外需要注意的是,如果多个任务同步一个数据源,为了不影响源库的性能,可能需要你在数据源上打开relay-log。具体可以参考文档。
https://docs.pingcap.com/zh/tidb/stable/relay-log#dm-relay-log