使用dumpling备份造成tidb OOM

【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
dumpling命令,分别使用了指定tidb_mem_quota_query和不指定tidb_mem_quota_query的情况都是一样的,tidb_mem_quota_query为什么没有生效

tiup dumpling -u root -P 4000 -h xxxx -p 'xxxxxxx' --filetype sql -o /data/export -F 500MiB  -t 1 --params "tidb_distsql_scan_concurrency=1,tidb_mem_quota_query=209715200" --consistency=auto

报错信息

[mysql] 2024/03/05 13:36:59 packets.go:122: closing bad idle connection: EOF
[2024/03/05 13:36:59.640 +00:00] [INFO] [collector.go:224] ["units canceled"] [cancel-unit=0]
[2024/03/05 13:36:59.668 +00:00] [INFO] [collector.go:225] ["backup failed summary"] [total-ranges=1] [ranges-succeed=0] [ranges-failed=1] [unit-name="dump table data"] [error="invalid connection; sql: START TRANSACTION: sql: connection is already closed; dial tcp xxx.xxx.xxx.xxx:4000: connect: connection refused"] [errorVerbose="the following errors occurred:\n -  invalid connection\n    github.com/pingcap/errors.AddStack\n    \tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/errors.go:174\n    github.com/pingcap/errors.Trace\n    \tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/juju_adaptor.go:15\n    github.com/pingcap/tidb/dumpling/export.(*multiQueriesChunkIter).nextRows.func1\n    \tgithub.com/pingcap/tidb/dumpling/export/ir_impl.go:87\n    github.com/pingcap/tidb/dumpling/export.(*multiQueriesChunkIter).nextRows\n    \tgithub.com/pingcap/tidb/dumpling/export/ir_impl.go:105\n    github.com/pingcap/tidb/dumpling/export.(*multiQueriesChunkIter).Next\n    \tgithub.com/pingcap/tidb/dumpling/export/ir_impl.go:153\n    github.com/pingcap/tidb/dumpling/export.WriteInsert\n    \tgithub.com/pingcap/tidb/dumpling/export/writer_util.go:247\n    github.com/pingcap/tidb/dumpling/export.FileFormat.WriteInsert\n    \tgithub.com/pingcap/tidb/dumpling/export/writer_util.go:660\n    github.com/pingcap/tidb/dumpling/export.(*Writer).tryToWriteTableData\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:243\n    github.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData.func1\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:228\n    github.com/pingcap/tidb/br/pkg/utils.WithRetry\n    \tgithub.com/pingcap/tidb/br/pkg/utils/retry.go:56\n    github.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:192\n    github.com/pingcap/tidb/dumpling/export.(*Writer).handleTask\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:115\n    github.com/pingcap/tidb/dumpling/export.(*Writer).run\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:93\n    github.com/pingcap/tidb/dumpling/export.(*Dumper).startWriters.func4\n    \tgithub.com/pingcap/tidb/dumpling/export/dump.go:376\n    golang.org/x/sync/errgroup.(*Group).Go.func1\n    \tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n    runtime.goexit\n    \truntime/asm_amd64.s:1598\n -  sql: connection is already closed\n    sql: START TRANSACTION\n    github.com/pingcap/tidb/dumpling/export.createConnWithConsistency\n    \tgithub.com/pingcap/tidb/dumpling/export/sql.go:922\n    github.com/pingcap/tidb/dumpling/export.(*Dumper).Dump.func4\n    \tgithub.com/pingcap/tidb/dumpling/export/dump.go:234\n    github.com/pingcap/tidb/dumpling/export.(*Dumper).Dump.func5\n    \tgithub.com/pingcap/tidb/dumpling/export/dump.go:255\n    github.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData.func1\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:204\n    github.com/pingcap/tidb/br/pkg/utils.WithRetry\n    \tgithub.com/pingcap/tidb/br/pkg/utils/retry.go:56\n    github.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:192\n    github.com/pingcap/tidb/dumpling/export.(*Writer).handleTask\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:115\n    github.com/pingcap/tidb/dumpling/export.(*Writer).run\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:93\n    github.com/pingcap/tidb/dumpling/export.(*Dumper).startWriters.func4\n    \tgithub.com/pingcap/tidb/dumpling/export/dump.go:376\n    golang.org/x/sync/errgroup.(*Group).Go.func1\n    \tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n    runtime.goexit\n    \truntime/asm_amd64.s:1598\n -  dial tcp xxx.xxx.xxx.xxx:4000: connect: connection refused\n    github.com/pingcap/errors.AddStack\n    \tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/errors.go:174\n    github.com/pingcap/errors.Trace\n    \tgithub.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/juju_adaptor.go:15\n    github.com/pingcap/tidb/dumpling/export.createConnWithConsistency\n    \tgithub.com/pingcap/tidb/dumpling/export/sql.go:903\n    github.com/pingcap/tidb/dumpling/export.(*Dumper).Dump.func4\n    \tgithub.com/pingcap/tidb/dumpling/export/dump.go:234\n    github.com/pingcap/tidb/dumpling/export.(*Dumper).Dump.func5\n    \tgithub.com/pingcap/tidb/dumpling/export/dump.go:255\n    github.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData.func1\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:204\n    github.com/pingcap/tidb/br/pkg/utils.WithRetry\n    \tgithub.com/pingcap/tidb/br/pkg/utils/retry.go:56\n    github.com/pingcap/tidb/dumpling/export.(*Writer).WriteTableData\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:192\n    github.com/pingcap/tidb/dumpling/export.(*Writer).handleTask\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:115\n    github.com/pingcap/tidb/dumpling/export.(*Writer).run\n    \tgithub.com/pingcap/tidb/dumpling/export/writer.go:93\n    github.com/pingcap/tidb/dumpling/export.(*Dumper).startWriters.func4\n    \tgithub.com/pingcap/tidb/dumpling/export/dump.go:376\n    golang.org/x/sync/errgroup.(*Group).Go.func1\n    \tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n    runtime.goexit\n    \truntime/asm_amd64.s:1598"]
[2024/03/05 13:36:59.708 +00:00] [INFO] [tso_dispatcher.go:214] ["exit tso dispatcher loop"]
[2024/03/05 13:36:59.708 +00:00] [INFO] [tso_dispatcher.go:162] ["exit tso requests cancel loop"]
[2024/03/05 13:36:59.708 +00:00] [INFO] [tso_dispatcher.go:375] ["[tso] stop fetching the pending tso requests due to context canceled"] [dc-location=global]
[2024/03/05 13:36:59.708 +00:00] [INFO] [tso_dispatcher.go:311] ["[tso] exit tso dispatcher"] [dc-location=global]

tidb的OOM

The 10 SQLs with the most memory usage for OOM analysis
SQL 0:
cost_time: 1.9202006489999999s
stats: xxx:448094468304535556
conn_id: 19
user: root
table_ids: [857]
txn_start_ts: 448171830564093953
mem_max: 295620 Bytes (288.7 KB)
sql: SELECT * FROM `xxx`.`xxx` WHERE `_tidb_rowid`>=70610415 and `_tidb_rowid`<70928387  ORDER BY `_tidb_rowid`

The 10 SQLs with the most time usage for OOM analysis
SQL 0:
cost_time: 1.9202006489999999s
stats: xxx:448094468304535556
conn_id: 19
user: root
table_ids: [857]
txn_start_ts: 448171830564093953
mem_max: 295620 Bytes (288.7 KB)
sql: SELECT * FROM `xxx`.`xxx` WHERE `_tidb_rowid`>=70610415 and `_tidb_rowid`<70928387  ORDER BY `_tidb_rowid`

两个建议,可以尝试一下是否有效:
1.增加 -r 200000
2.将参数换一种写法,不放在param里面配置,修改为:–tidb-mem-quota-query 209715200

1 个赞

有没有配合 tidb_mem_oom_action 使用

没有使用这个

  • 设置 -r 参数,可以划分导出数据区块减少 TiDB 扫描数据的内存开销,同时也可开启表内并发提高导出效率。当上游为 TiDB 且版本为 v3.0 或更新版本时,设置 -r 参数大于 0 表示使用 TiDB region 信息划分表内并发,具体取值不影响划分算法。
  • 调小 --tidb-mem-quota-query 参数到 8589934592 (8GB) 或更小。可控制 TiDB 单条查询语句的内存使用。
  • 调整 --params "tidb_distsql_scan_concurrency=5" 参数,即设置导出时的 session 变量 tidb_distsql_scan_concurrency 从而减少 TiDB scan 操作的并发度。
    使用 Dumpling 导出数据 | PingCAP 文档中心

-r 1000都不行,问题依旧,这工具太不稳定了

那只能调大服务器内存了

实际内存有多大尼

原来是12G,找了一台16G加入集群后还是问题依旧,准备搞个32G试试

调大内存试试

现在备份文件大小

tidb要设置tidb_mem_oom_action 的, 不然内存限制起不到效果


设置了oom-action,依然是oom
image

同样的命令,不同版本的区别

1 个赞

5版本系列好像oom问题确实比较多

升级最新版本试试,有优化,前提确保你内存够大。

1 个赞

个人建议升级。用下6.5

表结构是什么样的?看着你导出的数据量也不多

-r 调小,-F也调小,有效果么?

:+1: :+1: :+1:看来高版本确实做了很大的优化