tidb BR备份失败

【TiDB 使用环境】生产环境
【TiDB 版本】8.5.1
【部署方式】私有云
【操作系统/CPU 架构/芯片详情】centos7.9
【遇到的问题:问题现象及影响】
tidb使用br备份失败,执行备份命令:

tiup br backup full --pd "1.1.1.1:2379,1.1.1.2:2379,1.1.1.3:2379" -f '*.*' -f '!*_dev.*' -f '!*_test.*' -f '!*.T_Oms_Sync_Log' --storage "s3://tidbbk/$(date +%Y%m%d)" --send-credentials-to-tikv=true --s3.endpoint "http://1.1.1.4:9000" --log-file /tmp/tidb_backup.log --ratelimit 128

报错提示:

[2026/01/05 23:50:02.008 +08:00] [WARN] [backup.go:311] ["setting `--ratelimit` and `--concurrency` at the same time, ignoring `--concurrency`: `--ratelimit` forces sequential (i.e. concurrency = 1) backup"] [ratelimit=134.2MB/s] [concurrency-specified=4]
[2026/01/06 01:16:58.106 +08:00] [INFO] [collector.go:77] ["Full Backup failed summary"] [total-ranges=11928] [ranges-succeed=11928] [ranges-failed=0] [backup-checksum=3m12.420542886s] [backup-fast-checksum=90.757067ms] [backup-total-ranges=26520]

备份报错日志,这里面提示的库和表,每天还都不是同一个:

[2026/01/07 01:17:11.018 +08:00] [ERROR] [validate.go:70] ["checksum mismatch"] [db=test] [table=T_test123] ["origin tidb crc64"=14892190027201354309] ["calculated crc64"=16947611913402066651] ["origin tidb total kvs"=396227232] ["calculated total kvs"=297170424] ["origin tidb total bytes"=60919859806] ["calculated total bytes"=44575500010] [stack="github.com/pingcap/tidb/br/pkg/checksum.FastChecksum\n\t/workspace/source/tidb/br/pkg/checksum/validate.go:70\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:750\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]
[2026/01/07 01:17:11.129 +08:00] [INFO] [collector.go:78] ["Full Backup failed summary"] [total-ranges=12000] [ranges-succeed=12000] [ranges-failed=0] [backup-checksum=3m16.879045192s] [backup-fast-checksum=98.473826ms] [backup-total-ranges=26520]
[2026/01/07 01:17:11.129 +08:00] [ERROR] [backup.go:57] ["failed to backup"] [error="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch"] [errorVerbose="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch\ngithub.com/pingcap/errors.AddStack\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\ngithub.com/pingcap/errors.Trace\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/juju_adaptor.go:15\ngithub.com/pingcap/tidb/br/pkg/checksum.FastChecksum\n\t/workspace/source/tidb/br/pkg/checksum/validate.go:80\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:750\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [stack="main.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:57\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]
[2026/01/07 01:17:11.129 +08:00] [ERROR] [main.go:38] ["br failed"] [error="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch"] [errorVerbose="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch\ngithub.com/pingcap/errors.AddStack\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\ngithub.com/pingcap/errors.Trace\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/juju_adaptor.go:15\ngithub.com/pingcap/tidb/br/pkg/checksum.FastChecksum\n\t/workspace/source/tidb/br/pkg/checksum/validate.go:80\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:750\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [stack="main.main\n\t/workspace/source/tidb/br/cmd/br/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]

请问下,有人知道这个咋解决吗 :joy:

备份参数好像没看出啥错误来啊

备份日志报错:

[2026/01/07 01:17:11.129 +08:00] [INFO] [collector.go:78] ["Full Backup failed summary"] [total-ranges=12000] [ranges-succeed=12000] [ranges-failed=0] [backup-checksum=3m16.879045192s] [backup-fast-checksum=98.473826ms] [backup-total-ranges=26520]
[2026/01/07 01:17:11.129 +08:00] [ERROR] [backup.go:57] ["failed to backup"] [error="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch"] [errorVerbose="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch\ngithub.com/pingcap/errors.AddStack\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\ngithub.com/pingcap/errors.Trace\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/juju_adaptor.go:15\ngithub.com/pingcap/tidb/br/pkg/checksum.FastChecksum\n\t/workspace/source/tidb/br/pkg/checksum/validate.go:80\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:750\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [stack="main.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:57\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]
[2026/01/07 01:17:11.129 +08:00] [ERROR] [main.go:38] ["br failed"] [error="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch"] [errorVerbose="[BR:Backup:ErrBackupChecksumMismatch]backup checksum mismatch\ngithub.com/pingcap/errors.AddStack\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/errors.go:178\ngithub.com/pingcap/errors.Trace\n\t/root/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20240318064555-6bd07397691f/juju_adaptor.go:15\ngithub.com/pingcap/tidb/br/pkg/checksum.FastChecksum\n\t/workspace/source/tidb/br/pkg/checksum/validate.go:80\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:750\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"] [stack="main.main\n\t/workspace/source/tidb/br/cmd/br/main.go:38\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]

之前备份是成功的吗,是不是-f 规则哪里写的有冲突? br默认就是备份所有库,你只需要把排除的规则加上就可以。

之前是成功的,后面突然失败,刚刚查日志,找到了关键报错点,但是不知道怎么修复 :joy:

[2026/01/07 01:17:11.018 +08:00] [ERROR] [validate.go:70] ["checksum mismatch"] [db=test] [table=T_test123] ["origin tidb crc64"=14892190027201354309] ["calculated crc64"=16947611913402066651] ["origin tidb total kvs"=396227232] ["calculated total kvs"=297170424] ["origin tidb total bytes"=60919859806] ["calculated total bytes"=44575500010] [stack="github.com/pingcap/tidb/br/pkg/checksum.FastChecksum\n\t/workspace/source/tidb/br/pkg/checksum/validate.go:70\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/workspace/source/tidb/br/pkg/task/backup.go:750\nmain.runBackupCommand\n\t/workspace/source/tidb/br/cmd/br/backup.go:56\nmain.newFullBackupCommand.func1\n\t/workspace/source/tidb/br/cmd/br/backup.go:148\ngithub.com/spf13/cobra.(*Command).execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/root/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\nmain.main\n\t/workspace/source/tidb/br/cmd/br/main.go:36\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:267"]

是不是集群近期有做了什么调整?看着像是这个表T_test123导致的报错备份失败。可以重点排查下近期的调整,或者tikv状态是否正常

一共三个tikv节点,tikv状态看着也像没啥问题,查AI说,可能是region副本不一致,这个region的副本信息倒是没看到grafana上有图表展示 :joy:


备份期间有数据写入吗?

有的,但是做全量备份,有数据写入也正常吧。。。总不能备份时候要停机吧 :joy:

推荐处理方式

方案一(最安全)

保证备份期间只读:

SET GLOBAL tidb_super_read_only=ON;

然后重新执行 BR 备份。


方案二(业务不停,只要可恢复即可)

关闭 checksum:

br backup full \
  --checksum=false \
  ...

:warning: 风险:可能带脏数据,但很多生产在用。


方案三(排查表本身)

对该表执行:

ADMIN CHECK TABLE T_test123;

如果报错 → 表本身已损坏。


方案四(确认是否存在写入)

查看是否有写:

SELECT * FROM information_schema.tidb_trx;
1 个赞

排除大表 :如果 T_test123 这类表是非核心数据,可以在 -f 参数中将其排除。

s3检查了吗