使用Dumpling工具导出数据时发生 unexpected EOF busy buffer 错误

【 TiDB 使用环境】TiDB V4.0.0 Dumpling V4.0.2

【概述】 使用Dumpling工具导出数据,大约执行了20多分钟后,发生错误,具体如下:

[mysql] 2022/01/19 11:05:34 packets.go:427: busy buffe
[mysql] 2022/01/19 11:05:35 packets.go:36: unexpected EOF

[2022/01/19 11:06:03.619 +08:00] [ERROR] [main.go:191] ["dump failed error stack info"] [error="SELECT * FROM `db_pr`.`db_order`  WHERE (`id` >= 419858571988236251 AND `id` < 421896720395460626): invalid connection"] [errorVerbose="invalid connection\

SELECT * FROM db_pr.db_order WHERE (id >= 419858571988236251 AND id < 421896720395460626)"] [stack=“github.com/pingcap/log.Error
\t/go/pkg/mod/github.com/pingcap/log@v0.0.0-20200511115504-543df19646ad/global.go:42
main.main
\t/home/jenkins/agent/workspace/build-linux-amd64-4.0/go/src/github.com/pingcap/dumpling/cmd/dumpling/main.go:191
runtime.main
\t/usr/local/go/src/runtime/proc.go:203”]

dump failed: SELECT * FROM `db_pr`.`db_order`  WHERE (`id` >= 419858571988236251 AND `id` < 421896720395460626): invalid connection

【备份和数据迁移策略逻辑】 Dumpling导出数据时命令行参数如下:

dumpling -u db_user -h 172.17.10.10 --filetype sql -t 8 -o /data/backup/ -r 200000 -F 256MiB

【背景】 查了下当前正在导的表大小为 42968831 行数据。

【现象】 Dumping 刚开始运行时,有提示警告如下:

{"level":"warn","ts":"2022-01-19T10:43:48.769+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-018a32b6-7c67-49ca-a4ed-58cdf417d3d5/172.17.10.10:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}

[2022/01/19 10:43:48.769 +08:00] [WARN] [dump.go:57] ["meet error while check whether fetched pd addr and TiDB belongs to one cluster"] [error="context deadline exceeded"] [pdAddrs="[172.17.10.10:2379,172.17.10.89:2379,172.17.10.58:2379]"

[2022/01/19 10:43:48.769 +08:00] [WARN] [dump.go:95] ["If the amount of data to dump is large, criteria: (data more than 60GB or dumped time more than 10 minutes)\

you’d better adjust the tikv_gc_life_time to avoid export failure due to TiDB GC during the dump process.
Before dumping: run sql update mysql.tidb set VARIABLE_VALUE = '720h' where VARIABLE_NAME = 'tikv_gc_life_time'; in tidb.
After dumping: run sql update mysql.tidb set VARIABLE_VALUE = '10m' where VARIABLE_NAME = 'tikv_gc_life_time'; in tidb.
"]`

[2022/01/19 10:43:48.973 +08:00] [WARN] [block_allow_list.go:15] ["unsupported dump schema in TiDB now"] [schema=INFORMATION_SCHEMA]

[2022/01/19 10:43:48.973 +08:00] [WARN] [block_allow_list.go:15] ["unsupported dump schema in TiDB now"] [schema=PERFORMANCE_SCHEMA]

[2022/01/19 10:43:48.973 +08:00] [WARN] [block_allow_list.go:15] ["unsupported dump schema in TiDB now"] [schema=mysql]

[2022/01/19 10:43:48.973 +08:00] [WARN] [block_allow_list.go:15] ["unsupported dump schema in TiDB now"] [schema=METRICS_SCHEMA]

麻烦大佬给指点。
非常感谢 :pray:

看一下这个的输出:

select VARIABLE_NAME,VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME = ‘tikv_gc_life_time’;

怀疑是GC回收导致快照丢失,如果是默认值10分钟的话时间是能对上的

感谢您的回复,上述 SQL 的查询结果为:
±------------------±---------------+
| VARIABLE_NAME | VARIABLE_VALUE |
±------------------±---------------+
| tikv_gc_life_time | 10m0s |
±------------------±---------------+

这个默认值,没有修改过。

执行这个调大一点再导出试试。

或者这样也行:
set global tidb_gc_life_time=‘10h’;

感谢您的回复。

调大 tidb_gc_life_time 的值,对线上业务没影响吧?
另外,导完数据之后,再改回默认的 10m 吗?

gc_life_time调大会有一些影响,比如会占用更多存储空间,因为会保留更多的历史数据,这些历史数据会影响范围查询的效率。用完之后调回之前的值就行了,问题不大。

可以了解下TiDB的垃圾回收机制:
https://docs.pingcap.com/zh/tidb/dev/garbage-collection-overview
https://docs.pingcap.com/zh/tidb/dev/garbage-collection-configuration

收到,感谢您的回复。
我调整后再导一下试试,有问题再请教您。
再次感谢:pray:

我看你用的TiDB版本是TiDB 4.0,上面那个set global tidb_gc_life_time=‘10h’;估计用不了,还是用update mysql.tidb试试吧

好的,已经修改了,感谢。

您好,

把 gc 的时间调整为 3h 后,还是出现了这个错误:

[mysql] 2022/01/19 14:16:00 packets.go:72: unexpected EOF

[mysql] 2022/01/19 14:16:00 packets.go:427: busy buffer

[mysql] 2022/01/19 14:16:00 packets.go:72: unexpected EOF

[mysql] 2022/01/19 14:16:00 packets.go:427: busy buffer

[mysql] 2022/01/19 14:16:01 packets.go:72: unexpected EOF

[mysql] 2022/01/19 14:16:01 packets.go:427: busy buffer

[mysql] 2022/01/19 14:16:01 packets.go:36: unexpected EOF

[mysql] 2022/01/19 14:16:01 packets.go:36: unexpected EOF

[mysql] 2022/01/19 14:16:01 packets.go:72: unexpected EOF

[mysql] 2022/01/19 14:16:01 packets.go:427: busy buffer

[mysql] 2022/01/19 14:16:01 packets.go:72: unexpected EOF

[mysql] 2022/01/19 14:16:01 packets.go:427: busy buffer

[2022/01/19 14:16:30.307 +08:00] [ERROR] [main.go:191] ["dump failed error stack info"] [error="SELECT * FROM `db_pr`.`db_account`  WHERE (`id` >= 380029485835784686 AND `id` < 380686502629689542): invalid connection"] [errorVerbose="invalid connection\
SELECT * FROM `db_pr`.`db_account`  WHERE (`id` >= 380029485835784686 AND `id` < 380686502629689542)"] [stack="github.com/pingcap/log.Error\
\t/go/pkg/mod/github.com/pingcap/log@v0.0.0-20200511115504-543df19646ad/global.go:42\
main.main\
\t/home/jenkins/agent/workspace/build-linux-amd64-4.0/go/src/github.com/pingcap/dumpling/cmd/dumpling/main.go:191\
runtime.main\
\t/usr/local/go/src/runtime/proc.go:203"]

dump failed: SELECT * FROM `db_pr`.`db_account`  WHERE (`id` >= 380029485835784686 AND `id` < 380686502629689542): invalid connection

dumpling 的执行命令为:

dumpling -u db_user -h 172.17.10.10 --filetype sql -t 4 -o /data/backup/ -r 200000 -F 256MiB

看下Grafana的监控,TiDB SUmmary -> Server部分

您好,这是近一个小时的监控:

这个掉下来的时间点是不是导出报错的时间

从监控来看是导出的数据数据太大TiDB节点发生OOM了,处理方法参考这个:
https://docs.pingcap.com/zh/tidb/dev/dumpling-overview#控制导出-tidb-大表时的内存使用

1 个赞

感谢您的回复。

我把 Dumpling 里的线程数减小后,可以成功导出了。

再次感谢。

:+1::+1:

这种报错 invalid connection 大部分情况是 TiDB 挂掉了,Dumpling 并发调小之后应该是 TiDB 就不会发生 OOM 之类的情况。

感谢:pray:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。