br备份报ErrBackupNoLeader]backup no leader的错误

【 TiDB 使用环境】生产\测试环境\ POC
【 TiDB 版本】5.1.4 br v5.1.4
【遇到的问题】br 备份失败 (BR:Backup:ErrBackupNoLeader]backup no leader)
【复现路径】执行br备份:
【问题现象及影响】
【附件】
报错信息:
[2022/08/26 05:45:19.489 +08:00] [ERROR] [backup.go:41] [“failed to backup”] [error=“can not find leader: [BR:Backup:ErrBackupNoLeader]backup no leader”] [errorVerbose="[BR:Backup:ErrBackupNoLeader]backup no leader\ can not find leader\ngithub.com/pingcap/br/pkg/backup.(*Client).findRegionLeader\ \tgithub.com/pingcap/br/pkg/backup/client.go:547\ github.com/pingcap/br/pkg/backup.(*Client).handleFineGrained\ \tgithub.com/pingcap/br/pkg/backup/client.go:756\ github.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup.func2\ \tgithub.com/pingcap/br/pkg/backup/client.go:608\ runtime.goexit\ \truntime/asm_amd64.s:1371"] [stack=“main.runBackupCommand\ \tcommand-line-arguments/backup.go:41\ main.newFullBackupCommand.func1\ \tcommand-line-arguments/backup.go:109\ngithub.com/spf13/cobra.(*Command).execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:842\ github.com/spf13/cobra.(*Command).ExecuteC\ \tgithub.com/spf13/cobra@v1.0.0/command.go:950\ github.com/spf13/cobra.(*Command).Execute\ \tgithub.com/spf13/cobra@v1.0.0/command.go:887\ main.main\ \tcommand-line-arguments/main.go:56\ runtime.main\ \truntime/proc.go:225”]
[2022/08/26 05:45:19.489 +08:00] [ERROR] [main.go:58] [“br failed”] [error=“can not find leader: [BR:Backup:ErrBackupNoLeader]backup no leader”] [errorVerbose="[BR:Backup:ErrBackupNoLeader]backup no leader\ can not find leader\ngithub.com/pingcap/br/pkg/backup.(*Client).findRegionLeader\ \tgithub.com/pingcap/br/pkg/backup/client.go:547\ github.com/pingcap/br/pkg/backup.(*Client).handleFineGrained\ \tgithub.com/pingcap/br/pkg/backup/client.go:756\ github.com/pingcap/br/pkg/backup.(*Client).fineGrainedBackup.func2\ \tgithub.com/pingcap/br/pkg/backup/client.go:608\ runtime.goexit\ \truntime/asm_amd64.s:1371"] [stack=“main.main\ \tcommand-line-arguments/main.go:58\ runtime.main\ \truntime/proc.go:225”]


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

问题以及找到

是什么问题导致?可以分享一下~

解决方案发一下,帮助更多小伙伴:grin:

这个报错可读性不强啊,解决思路是什么呢?

发生这个错误的原因是 5次也不能从PD找到某个 Key的Region/Leader

https://github.com/pingcap/br/blob/v5.1.4/pkg/backup/client.go#L546-L547

同问是怎么解决的?

原因是当时我们强制下掉节点后,有一个节点一直是down的状态,从pd上看上面记录着一个region,导致br找不到这个region的leader信息,手动强制下掉这个节点就恢复了

1 个赞

pd没有及时更新有问题region的元数据

对 是的

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。