TiDB BR备份失败

【 TiDB 使用环境】生产环境
【TiDB 版本】v7.5.6
【操作系统】CentOS 7.9
【部署方式】机器部署
【集群数据量】2.1TB
【集群节点数】TIDB:3 PD:3 TIKV:6
【遇到的问题:使用BR备份线上集群出现报错,BR的版本是v7.5.6】,

备份命令如下:

/bin/br backup full --pd '{PD_Addr}' --storage '{OssBucket}/snapshot-{BackupDate}?access-key={AccessKey}&secret-access-key={SecretAccessKey}&endpoint={EndPoint}&provider=alibaba&region={Region}' --send-credentials-to-tikv=true --ratelimit 32 --checksum=false --log-file {BackupLogFile}

现象是备份后续流程一直卡住不动,日志里显示一直在报start to flush the checkpoint lock,详细日志如下:

[2025/06/18 17:43:11.361 +08:00] [INFO] [client.go:1504] ["try backup"] [range-sn=10202] [store-id=31669927] ["retry time"=0]
[2025/06/18 17:43:11.374 +08:00] [INFO] [client.go:1006] ["backup push down completed"] [range-sn=10202] [small-range-count=1]
[2025/06/18 17:43:11.374 +08:00] [INFO] [client.go:1023] ["transactional range backup completed"] [range-sn=10202] [StartTS=0] [EndTS=458808697839943754]
[2025/06/18 17:43:11.374 +08:00] [INFO] [client.go:943] ["backup range completed"] [range-sn=10202] [startKey=74800000000000112F5F720000000000000000] [endKey=74800000000000112F5F72FFFFFFFFFFFFFFFF00] [take=13.543405ms]
[2025/06/18 17:43:11.374 +08:00] [INFO] [client.go:893] ["Backup Ranges Completed"] [take=6h31m42.988799715s]
[2025/06/18 17:43:28.446 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750239808395] [expire-at=1750240108395]
[2025/06/18 17:47:28.486 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750240048345] [expire-at=1750240348345]
[2025/06/18 17:51:28.495 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750240288345] [expire-at=1750240588345]
[2025/06/18 17:55:28.501 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750240528346] [expire-at=1750240828346]
[2025/06/18 17:59:28.486 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750240768346] [expire-at=1750241068346]
[2025/06/18 18:03:28.588 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750241008345] [expire-at=1750241308345]
[2025/06/18 18:07:28.489 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750241248345] [expire-at=1750241548345]
[2025/06/18 18:11:28.490 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750241488346] [expire-at=1750241788346]
[2025/06/18 18:15:28.573 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750241728345] [expire-at=1750242028345]
[2025/06/18 18:19:28.546 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750241968345] [expire-at=1750242268345]
[2025/06/18 18:23:28.521 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750242208346] [expire-at=1750242508346]
[2025/06/18 18:27:28.526 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750242448345] [expire-at=1750242748345]
[2025/06/18 18:31:28.494 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750242688345] [expire-at=1750242988345]
[2025/06/18 18:35:28.595 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750242928345] [expire-at=1750243228345]
[2025/06/18 18:39:28.505 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750243168345] [expire-at=1750243468345]
[2025/06/18 18:43:28.482 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750243408345] [expire-at=1750243708345]
[2025/06/18 18:47:28.456 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750243648345] [expire-at=1750243948345]
[2025/06/18 18:51:28.652 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750243888345] [expire-at=1750244188345]
[2025/06/18 18:55:28.510 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750244128345] [expire-at=1750244428345]
[2025/06/18 18:59:28.484 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750244368346] [expire-at=1750244668346]
[2025/06/18 19:03:28.484 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750244608345] [expire-at=1750244908345]
[2025/06/18 19:07:28.492 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750244848345] [expire-at=1750245148345]
[2025/06/18 19:11:28.487 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750245088345] [expire-at=1750245388345]
[2025/06/18 19:15:28.522 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750245328345] [expire-at=1750245628345]
[2025/06/18 19:19:28.476 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750245568345] [expire-at=1750245868345]
[2025/06/18 19:23:28.511 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750245808345] [expire-at=1750246108345]
[2025/06/18 19:27:28.504 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750246048345] [expire-at=1750246348345]
[2025/06/18 19:31:28.533 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750246288345] [expire-at=1750246588345]
[2025/06/18 19:35:28.519 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750246528346] [expire-at=1750246828346]
[2025/06/18 19:39:28.486 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750246768345] [expire-at=1750247068345]
[2025/06/18 19:43:28.503 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750247008345] [expire-at=1750247308345]
[2025/06/18 19:47:28.493 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750247248345] [expire-at=1750247548345]
[2025/06/18 19:51:28.489 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750247488345] [expire-at=1750247788345]
[2025/06/18 19:55:28.588 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750247728345] [expire-at=1750248028345]
[2025/06/18 19:59:28.509 +08:00] [INFO] [checkpoint.go:611] ["start to flush the checkpoint lock"] [lock-at=1750247968345] [expire-at=1750248268345]

你这个备份应该算成功了,现在在定期刷新 checkpoint lock 来确保备份的一致性,等等吧

你这个实例是不是有很多元数据?表和索引是不是特别多?

表不算特别多,大概50张表。

普通表还是分区表

有两张是分区表,大概每张表360个左右分区

1 个赞

应该就是分区比较多。这里的info 级别的log就是简单输出一下,没什么特别的地方。