br全量备份报错

版本:4.0.10

br报错日志
[2022/04/19 08:37:43.868 +08:00] [PANIC] [safe_point.go:130] [“cannot pass gc safe point check, aborting”] [error=“GC safepoint 432623436995756032 exceed TS 432615614369759285: [BR:Backup:ErrBackupGCSafepointExceeded]backup GC safepoint exceeded”] [errorVerbose="[BR:Backup:ErrBackupGCSafepointExceeded]backup GC safepoint exceeded
GC safepoint 432623436995756032 exceed TS 432615614369759285
github.com/pingcap/br/pkg/utils.CheckGCSafePoint
github.com/pingcap/br@/pkg/utils/safe_point.go:72
github.com/pingcap/br/pkg/utils.StartServiceSafePointKeeper.func1
github.com/pingcap/br@/pkg/utils/safe_point.go:129
runtime.goexit
runtime/asm_amd64.s:1357"] [safePoint="{ID=br-15272c00-761d-486d-809a-bc831a9c2089,TTL=5m0s,BackupTime=“2022-04-19 00:00:01.203 +0800 CST”,BackupTS=432615614369759285}"] [stack=“github.com/pingcap/br/pkg/utils.StartServiceSafePointKeeper.func1
github.com/pingcap/br@/pkg/utils/safe_point.go:130”]
[2022/04/19 08:37:44.777 +08:00] [WARN] [base_client.go:194] ["[pd] cannot update leader"] [address=http://10.240.47.44:2379] [error="[PD:grpc:ErrGRPCDial]context deadline exceeded"] [errorVerbose="[PD:grpc:ErrGRPCDial]context deadline exceeded
github.com/pingcap/errors.AddStack
github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.(*Error).GenWithStackByCause
github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:279
github.com/tikv/pd/pkg/grpcutil.GetClientConn
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/pkg/grpcutil/grpcutil.go:100
github.com/tikv/pd/client.(*baseClient).getOrCreateGRPCConn
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/client/base_client.go:280
github.com/tikv/pd/client.(*baseClient).getMembers
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/client/base_client.go:212
github.com/tikv/pd/client.(*baseClient).updateLeader
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/client/base_client.go:192
github.com/tikv/pd/client.(*baseClient).leaderLoop
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/client/base_client.go:139
runtime.goexit
runtime/asm_amd64.s:1357"]
[2022/04/19 08:37:44.838 +08:00] [ERROR] [base_client.go:140] ["[pd] failed updateLeader"] [error="[PD:client:ErrClientGetLeader]get leader from [http://10.240.38.115:2379 http://10.240.38.141:2379 http://10.240.47.44:2379] error"] [stack=“github.com/tikv/pd/client.(*baseClient).leaderLoop
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/client/base_client.go:140”]
[2022/04/19 08:37:44.817 +08:00] [ERROR] [base_client.go:140] ["[pd] failed updateLeader"] [error="[PD:client:ErrClientGetLeader]get leader from [http://10.240.38.115:2379 http://10.240.38.141:2379 http://10.240.47.44:2379] error"] [stack=“github.com/tikv/pd/client.(*baseClient).leaderLoop
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/client/base_client.go:140”]

1 个赞

可以看下备份的 ts 是不是超过了 GC 设置的时间点,可以按照下面的变量来设置更长的时间

GC 配置 | PingCAP Docs

1 个赞

备份是全量备份,是会自动修改gc时间的,理论上不会发生这样的事情。

1 个赞

你那边确定是4.0.8以上版本吧

建议手工调整GC时间,验证一下

1 个赞

版本是4.0.10,主要一直备份成功的,今天凌晨失败了。

1 个赞

从报错日志上看,确实可能是GC问题,可以手工调整验证一下

1 个赞

我觉得不是 gc 时间 safe_time 的问题,堆栈信息说的是:
github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174
github.com/pingcap/errors.(*Error.GenWithStackByCause
[github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:279](http://github.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:279)
github.com/tikv/pd/pkg/grpcutil.GetClientConn
github.com/tikv/pd@v0.0.0-20210105112549-e5be7fd38659/pkg/grpcutil/grpcutil.go:100
github.com/tikv/pd/client.(*baseClient.getOrCreateGRPCConn

所以直接原因应该是在pd 在keep gc safe point 时候,GetClientConn 失败了

2 个赞

没看懂 GetClientConn 失败的可能原因是什么。换个思路不如看一下 pd leader 的 log 是否有价值的内容

1 个赞

已解决问题,主要是宿主机部署了其他的程序oom,导致kill了br备份的工具。日志里也确实报了tikv和br通信中断,谢谢帮助。

1 个赞

这种情况记得可以调高oom时被kill的优先级,让系统优先kill br之下的程序来释放内存

1 个赞

重新发起一次备份试试。

1 个赞

优化触发oom的程序后,今天备份没有问题,多谢帮助

是的。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。