Tidb使用br全量备份数据失败

【 TiDB 使用环境】测试
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
使用命令:./br backup full --pd “10.33.65.73:2379” --storage “local:///tidb-backup/test-tidb/” --log-file test-backupfull.log
Detail BR log in test-backupfull.log
[2023/11/02 16:59:54.194 +08:00] [INFO] [collector.go:77] [“Full Backup failed summary”] [total-ranges=0] [ranges-succeed=0] [ranges-failed=0]
Error: running BR in incompatible version of cluster, if you believe it’s OK, use --check-requirements=false to skip.: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp: lookup test-tidb-pd-1.test-tidb-pd-peer.base-server.svc on 127.0.0.53:53: no such host”
该local路径为载在本机的一个nfs系统。

Tidb版本7.1.0,br版本7.1.0


日志信息

执行备份命令的机器能访问到10.33.65.73的2379端口吗

这个ip是我给pd开的一个负载均衡器后端监听的2379这个端口,nc -z 10.33.65.73 2379没有异常

K8S 环境建议使用 BACKUP CRD 的方式进行备份。

--check-requirements=false 直接加参数,忽略吧

curl http://10.33.65.73:2379

lookup test-tidb-pd-1.test-tidb-pd-peer.base-server.svc on 127.0.0.53:53: no such host 这里不对啊

参照这里,k8s备份和物理机不太一样的,https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/backup-to-pv-using-br#备份-tidb-集群到持久卷

正常的

我试下走backup的crd

pd不需要负载均衡

一开始想使用SQL通过pd备份,但是pd是在集群内,我是远程连接,所以加了个负载均衡器可以外部访问使用,不过貌似并不好用,最后还是用到backup的crd

Error from server (Forbidden): backups.pingcap.com “tidb-backup-full-2023-11-03t10-55-00” is forbidden: User “system:serviceaccount:base-server:tidb-backup-manager” cannot update resource “backups” in API group “pingcap.com” in the namespace “base-server”

通过crd启动job后job日志中会有这个问题,貌似是因为角色没有权限,我重新给这个角色绑定之后还是不太行

使用crd备份过程中出现了另一个问题
日志错误信息如下:
[2023/11/03 11:55:12.102 +08:00] [ERROR] [backup.go:54] [“failed to backup”] [error=“failed to backup to file:////test-tidb/test-tidb-pd.base-server-2379-2023-11-03t10-55-00, because the checkpoint mode is used, but the hashs of the configs are not the same. Please check the config: [BR:Common:ErrInvalidArgument]invalid argument”] [errorVerbose=“[BR:Common:ErrInvalidArgument]invalid argument\nfailed to backup to file:////test-tidb/test-tidb-pd.base-server-2379-2023-11-03t10-55-00, because the checkpoint mode is used, but the hashs of the configs are not the same. Please check the config\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).CheckCheckpoint\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:267\ngithub.com/pingcap/tidb/br/pkg/task.RunBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/backup.go:447\nmain.runBackupCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:53\nmain.newFullBackupCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:143\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:58\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1598”] [stack=“main.runBackupCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:54\nmain.newFullBackupCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:143\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:58\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

重新分配role和rolebinding后解决了这个问题

老师这边现在job中一直有这样一个错误导致备份任务失败,


搞不清楚

这显示参数有问题,你现在用的yaml配置文件能看下?

我改了下配置后又开始报另一个问题