br恢复时报错:["br failed"] [error="[kv:8004]Transaction is too large

【TiDB 使用环境】测试环境
【TiDB 版本】8.5.0
【操作系统】Rocky 9.1
【部署方式】k8s
【集群数据量】10G
【集群节点数】1
【遇到的问题:问题现象及影响】
准备从6.5.5升级到8.5.0。我在本地使用kind安装了一个tidb集群。在使用br从s3进行数据恢复时,报错。数据来自于6.5.5的快照备份,我做的也是快照恢复。恢复的yanl文件如下:

【复制黏贴 ERROR 报错的日志】
[2025/03/11 06:41:45.051 +00:00] [ERROR] [main.go:38] [“br failed”] [error=“[kv:8004]Transaction is too large, size: 105426498”] [errorVerbose=“[kv:8004]Transaction is too large, size: 105426498\ngithub.com/pingcap/errors.AddStack\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/normalize.go:175\ngithub.com/pingcap/tidb/pkg/store/driver/error.ToTiDBErr\n\t/workspace/source/tidb/pkg/store/driver/error/error.go:92\ngithub.com/pingcap/tidb/pkg/store/driver/txn.(*memBuffer).SetWithFlags\n\t/workspace/source/tidb/pkg/store/driver/txn/unionstore_driver.go:96\ngithub.com/pingcap/tidb/pkg/table/tblctx.(*EncodeRowBuffer).WriteMemBufferEncoded\n\t/workspace/source/tidb/pkg/table/tblctx/buffers.go:82\ngithub.com/pingcap/tidb/pkg/table/tables.(*TableCommon).addRecord\n\t/workspace/source/tidb/pkg/table/tables/tables.go:868\ngithub.com/pingcap/tidb/pkg/table/tables.(*TableCommon).AddRecord\n\t/workspace/source/tidb/pkg/table/tables/tables.go:687\ngithub.com/pingcap/tidb/pkg/executor.(*InsertValues).addRecordWithAutoIDHint\n\t/workspace/source/tidb/pkg/executor/insert_common.go:1421\ngithub.com/pingcap/tidb/pkg/executor.(*InsertValues).addRecord\n\t/workspace/source/tidb/pkg/executor/insert_common.go:1406\ngithub.com/pingcap/tidb/pkg/executor.(*InsertExec).exec\n\t/workspace/source/tidb/pkg/executor/insert.go:113\ngithub.com/pingcap/tidb/pkg/executor.insertRows\n\t/workspace/source/tidb/pkg/executor/insert_common.go:254\ngithub.com/pingcap/tidb/pkg/executor.(*InsertExec).Next\n\t/workspace/source/tidb/pkg/executor/insert.go:359\ngithub.com/pingcap/tidb/pkg/executor/internal/exec.Next\n\t/workspace/source/tidb/pkg/executor/internal/exec/executor.go:456\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).next\n\t/workspace/source/tidb/pkg/executor/adapter.go:1266\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelayExecutor\n\t/workspace/source/tidb/pkg/executor/adapter.go:1015\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).handleNoDelay\n\t/workspace/source/tidb/pkg/executor/adapter.go:848\ngithub.com/pingcap/tidb/pkg/executor.(*ExecStmt).Exec\n\t/workspace/source/tidb/pkg/executor/adapter.go:611\ngithub.com/pingcap/tidb/pkg/session.runStmt\n\t/workspace/source/tidb/pkg/session/session.go:2288\ngithub.com/pingcap/tidb/pkg/session.(*session).ExecuteStmt\n\t/workspace/source/tidb/pkg/session/session.go:2150\ngithub.com/pingcap/tidb/pkg/session.(*session).ExecuteInternal\n\t/workspace/source/tidb/pkg/session/session.go:1523\ngithub.com/pingcap/tidb/pkg/ddl/session.(*Session).Execute\n\t/workspace/source/tidb/pkg/ddl/session/session.go:85\ngithub.com/pingcap/tidb/pkg/ddl.insertDDLJobs2Table\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:699\ngithub.com/pingcap/tidb/pkg/ddl.(*JobSubmitter).GenGIDAndInsertJobsWithRetry.func1\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:444\ngithub.com/pingcap/tidb/pkg/ddl.genGIDAndCallWithRetry.func1\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:614\ngithub.com/pingcap/tidb/pkg/ddl.genGIDAndCallWithRetry\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:618\ngithub.com/pingcap/tidb/pkg/ddl.(*JobSubmitter).GenGIDAndInsertJobsWithRetry\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:424\ngithub.com/pingcap/tidb/pkg/ddl.(*JobSubmitter).addBatchDDLJobs2Table\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:351\ngithub.com/pingcap/tidb/pkg/ddl.(*JobSubmitter).addBatchDDLJobs\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:119\ngithub.com/pingcap/tidb/pkg/ddl.(*JobSubmitter).submitLoop\n\t/workspace/source/tidb/pkg/ddl/job_submitter.go:93\ngithub.com/pingcap/tidb/pkg/ddl.(*ddl).Start.func1\n\t/workspace/source/tidb/pkg/ddl/ddl.go:790\ngithub.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run.func1\n\t/workspace/source/tidb/pkg/util/wait_group_wrapper.go:157\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1700”] [stack=“main.main\n\t/workspace/source/tidb/br/cmd/br/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:272”]
Error: [kv:8004]Transaction is too large, size: 105426498
, err: exit status 1
Sleeping for 10 seconds before exit…
【其他附件:截图/日志/监控】
按照文档,我修改了环境变量 tidb_mem_quota_query 为 2个G

但依旧报错,我看报错的大小正好是100M,我就看了下配置txn-total-size-limit,这个默认值正好是100M。我就又修改了这个参数,重装了集群。但还原时还是报上面同样的错。我怀疑是不是我的参数配置的有问题,我用如下sql查询,确实没查到:


但是我的集群配置文件确实是配置了:

而且安装文档所说,这个配置项应该是不用再配置了才对啊。默认走tidb_mem_quota_query 。

从日志看是 DDL 内部的逻辑,导致的报错,先确认下恢复的数据模式是怎么样的?
多少表?表结构大概多少列?

目前怀疑是 batch create table 里面的 job info 太大,导致 DDL 内部 SQL 超过了事务限制,可以尝试恢复时候指定 --ddl-batch-size=1 来 workaround

SHOW config WHERE NAME LIKE ‘%txn-total-size-limit%’,
这个参数全称是performance.txn-total-size-limit,另外

  • 在 v6.5.0 及之后的版本中,不再推荐使用该配置项,事务的内存大小会被累计计入所在会话的内存使用量中,并由 tidb_mem_quota_query 变量在单个会话内存超阈值时采取控制行为。为了向前兼容,由低版本升级至 v6.5.0 及更高版本时,该配置项的行为如下所述:
    • 若该配置项未设置,或设置为默认值 (104857600),升级后事务内存大小将会计入所在会话的内存使用中,由 tidb_mem_quota_query 变量控制。
    • 若该配置项未设为默认值 (104857600),升级前后该配置项仍生效,对单个事务大小的限制行为不会发生变化,事务内存大小不由 tidb_mem_quota_query 控制。

按照你的查询语句,确实查到了,确实是改成功了。


但是按照文档所说,这个值不配置的情况我也做了,上面我已经说明了。还是一样报错。就好像无视了tidb_mem_quota_query 一样。

一万三千多张表 表结构的话 最复杂的有三十列

我试一下

表结构数据并不大,是不是有 parititon 表?