BR备份版本选择有问题,导致tidb备份失败

【 TiDB 使用环境】生产\测试环境\ POC
生产环境
tidb operator
tidb-operator-1.1.6

K8S版本
GitVersion:“v1.16.4-12.8d683d9”

【 TiDB 版本】
v5.4.0
【遇到的问题】

$ helm install tpaas-tidb-backup tidb-full-backup/ -n tpaas-new-tidb
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/tpaasbjopdba/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/tpaasbjopdba/.kube/config
NAME: tpaas-tidb-backup
LAST DEPLOYED: Tue Jun 28 20:10:47 2022
NAMESPACE: tpaas-new-tidb
STATUS: deployed
REVISION: 1
TEST SUITE: None
$ kubectl get pods -n tpaas-new-tidb
NAME READY STATUS RESTARTS AGE
backup-daemon-backup-s3-lktnr 0/1 Error 0 6s
tpaas-new-tidb-discovery-6557f7b5f6-5kdzg 1/1 Running 0 5h43m
tpaas-new-tidb-monitor-596ff78c4b-gnt6c 3/3 Running 0 4h7m
tpaas-new-tidb-pd-0 1/1 Running 0 3h53m
tpaas-new-tidb-pd-1 1/1 Running 0 4h2m
tpaas-new-tidb-pd-2 1/1 Running 0 4h7m
tpaas-new-tidb-ticdc-0 1/1 Running 0 3h32m
tpaas-new-tidb-tidb-0 2/2 Running 0 3h28m
tpaas-new-tidb-tidb-1 2/2 Running 0 3h30m
tpaas-new-tidb-tidb-2 2/2 Running 0 3h31m
tpaas-new-tidb-tidb-initializer-9h2rb 0/1 Completed 0 5h43m
tpaas-new-tidb-tikv-0 1/1 Running 0 3h32m
tpaas-new-tidb-tikv-1 1/1 Running 0 3h42m
tpaas-new-tidb-tikv-2 1/1 Running 0 3h47m

使用tidb-backups-s3 https://github.com/pingcap/tidb-operator/blob/master/manifests/backup/backup-s3-br.yaml 备份,备份失败,BR版本的选择不对,br选择的是v4.0.7
【复现路径】做过哪些操作出现的问题
【问题现象及影响】
Create rclone.conf file.
/tidb-backup-manager backup --namespace=tpaas-new-tidb --backupName=daemon-backup-s3 --tikvVersion=v5.4.0
I0628 12:10:48.842586 1 backup.go:71] start to process backup tpaas-new-tidb/daemon-backup-s3
I0628 12:10:48.853024 1 backup_status_updater.go:64] Backup: [tpaas-new-tidb/daemon-backup-s3] updated successfully
I0628 12:10:48.875320 1 backup_status_updater.go:64] Backup: [tpaas-new-tidb/daemon-backup-s3] updated successfully
I0628 12:10:48.880814 1 manager.go:176] cluster tpaas-new-tidb/daemon-backup-s3 tikv_gc_life_time is 10m0s
I0628 12:10:48.891028 1 manager.go:240] set cluster tpaas-new-tidb/daemon-backup-s3 tikv_gc_life_time to 72h success
I0628 12:10:48.891063 1 backup.go:67] Running br command with args: [backup full --pd=tpaas-new-tidb-pd.tpaas-new-tidb:2379 --storage=s3://tpaas-tidb-backup/tpaas-tidb-new/backup/06281700/ --s3.region=cn-north-1 --s3.provider=ceph --s3.endpoint=http://s3-internal.cn-north-1.jdcloud-oss.com]
I0628 12:10:48.915795 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:35] [“Welcome to Backup & Restore (BR)”]
I0628 12:10:48.915820 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:36] [BR] [release-version=v4.0.7]
I0628 12:10:48.915830 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:37] [BR] [git-hash=4d29fcccaa12d6355a829a69b8df1594281a14e2]
I0628 12:10:48.915839 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:38] [BR] [git-branch=heads/refs/tags/v4.0.7]
I0628 12:10:48.915845 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:39] [BR] [go-version=go1.13]
I0628 12:10:48.915851 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:40] [BR] [utc-build-time=“2020-09-29 06:52:02”]
I0628 12:10:48.915857 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [version.go:41] [BR] [race-enabled=false]
I0628 12:10:48.915868 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [common.go:378] [arguments] [pd="[tpaas-new-tidb-pd.tpaas-new-tidb:2379]"] [s3.endpoint=http://s3-internal.cn-north-1.jdcloud-oss.com] [s3.provider=ceph] [s3.region=cn-north-1] [storage=s3://tpaas-tidb-backup/tpaas-tidb-new/backup/06281700/]
I0628 12:10:48.915937 1 backup.go:91] [2022/06/28 12:10:48.915 +00:00] [INFO] [client.go:148] ["[pd] create pd client with endpoints"] [pd-address="[tpaas-new-tidb-pd.tpaas-new-tidb:2379]"]
I0628 12:10:48.923721 1 backup.go:91] [2022/06/28 12:10:48.923 +00:00] [INFO] [base_client.go:237] ["[pd] update member urls"] [old-urls="[http://tpaas-new-tidb-pd.tpaas-new-tidb:2379]"] [new-urls="[http://tpaas-new-tidb-pd-0.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-1.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379]"]
I0628 12:10:48.923735 1 backup.go:91] [2022/06/28 12:10:48.923 +00:00] [INFO] [base_client.go:253] ["[pd] switch leader"] [new-leader=http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379] [old-leader=]
I0628 12:10:48.923753 1 backup.go:91] [2022/06/28 12:10:48.923 +00:00] [INFO] [base_client.go:103] ["[pd] init cluster id"] [cluster-id=7114173756406158756]
I0628 12:10:48.941415 1 backup.go:91] [2022/06/28 12:10:48.941 +00:00] [INFO] [client.go:148] ["[pd] create pd client with endpoints"] [pd-address="[tpaas-new-tidb-pd.tpaas-new-tidb:2379]"]
I0628 12:10:48.948296 1 backup.go:91] [2022/06/28 12:10:48.948 +00:00] [INFO] [base_client.go:237] ["[pd] update member urls"] [old-urls="[http://tpaas-new-tidb-pd.tpaas-new-tidb:2379]"] [new-urls="[http://tpaas-new-tidb-pd-0.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-1.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379,http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379]"]
I0628 12:10:48.948326 1 backup.go:91] [2022/06/28 12:10:48.948 +00:00] [INFO] [base_client.go:253] ["[pd] switch leader"] [new-leader=http://tpaas-new-tidb-pd-2.tpaas-new-tidb-pd-peer.tpaas-new-tidb.svc:2379] [old-leader=]
I0628 12:10:48.948372 1 backup.go:91] [2022/06/28 12:10:48.948 +00:00] [INFO] [base_client.go:103] ["[pd] init cluster id"] [cluster-id=7114173756406158756]
I0628 12:10:48.954318 1 backup.go:91] [2022/06/28 12:10:48.954 +00:00] [INFO] [collector.go:187] [“Full backup Failed summary : total backup ranges: 0, total success: 0, total failed: 0”]
I0628 12:10:48.954515 1 backup.go:91] [2022/06/28 12:10:48.954 +00:00] [ERROR] [backup.go:25] [“failed to backup”] [error=“running BR in incompatible version of cluster, if you believe it’s OK, use --check-requirements=false to skip.: TiKV node tpaas-new-tidb-tikv-1.tpaas-new-tidb-tikv-peer.tpaas-new-tidb.svc:20160 version 5.4.0 and BR v4.0.7 major version mismatch, please use the same version of BR”] [errorVerbose=“TiKV node tpaas-new-tidb-tikv-1.tpaas-new-tidb-tikv-peer.tpaas-new-tidb.svc:20160 version 5.4.0 and BR v4.0.7 major version mismatch, please use the same version of BR\ngithub.com/pingcap/br/pkg/utils.CheckClusterVersion\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/utils/version.go:123\ github.com/pingcap/br/pkg/conn.NewMgr\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/conn/conn.go:203\ github.com/pingcap/br/pkg/task.newMgr\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/task/common.go:312\ github.com/pingcap/br/pkg/task.RunBackup\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/pkg/task/backup.go:178\ github.com/pingcap/br/cmd.runBackupCommand\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:24\ github.com/pingcap/br/cmd.newFullBackupCommand.func1\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:84\ github.com/spf13/cobra.(*Command).execute\ \t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\ \t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\ \t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887\ main.main\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/main.go:57\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357\ running BR in incompatible version of cluster, if you believe it’s OK, use --check-requirements=false to skip.”] [stack=“github.com/pingcap/br/cmd.runBackupCommand\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:25\ github.com/pingcap/br/cmd.newFullBackupCommand.func1\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/cmd/backup.go:84\ github.com/spf13/cobra.(*Command).execute\ \t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:842\ngithub.com/spf13/cobra.(*Command).ExecuteC\ \t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\ \t/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887\ main.main\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.7/go/src/github.com/pingcap/br/main.go:57\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203”]
【附件】

  • 相关日志、配置文件、Grafana 监控(https://metricstool.pingcap.com/)
  • TiUP Cluster Display 信息
  • TiUP CLuster Edit config 信息
  • TiDB-Overview 监控
  • 对应模块的 Grafana 监控(如有 BR、TiDB-binlog、TiCDC 等)
  • 对应模块日志(包含问题前后 1 小时日志)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

tidb和br采用相同的版本再试一下呢?建议不要跨大版本使用br

tidb使用4.0.16,operator是1.1.6的,创建备份时br还是用的4.0.7

tidb-operator-1.1.6,版本是怎么获取br的版本的?

可以参考一下官方文档中的版本对应关系
https://docs.pingcap.com/zh/tidb/v5.4/backup-and-restore-tool#使用限制

1 个赞

tidb-operator-1.1.6,br的版本怎么指定?

https://docs.pingcap.com/zh/tidb-in-kubernetes/v1.1/backup-restore-overview#:~:text=如果指定了%20BR%20的版本,例如%20.spec.toolImage%3A%20pingcap/br%3Av5.0.6,那么使用指定的版本镜像进行备份。
这个文档可以参考一下

TiDB Operator 是 Kubernetes 上的 TiDB 集群自动运维系统,提供包括部署、升级、扩缩容、备份恢复、配置变更的 TiDB 全生命周期管理。

选择Br版本应该依据TiDB数据库的版本,而不是运维工具的版本,我是这么理解的。

我们试试指定版本

另外,看错误提示是5.4版本,但看你回复说的是4.0.16版本。 TiDB集群到底是哪个版本,这个要确定好。

5.4/4.0.16,我们都测试了,br总是用4.0.7

5.4集群的用br 5.0的版本试试

operator v1.1.6不支持这个配置项,v1.1.9才支持

br的版本指定不了,operator来控制的

应该是1.1.7 开始支持的,release note里面相关描述
https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/release-1.1.7#新功能:~:text=新增%20Backup%20和%20Restore%20CR%20的配置项%20spec.toolImage%20来指定%20BR%20工具使用的二进制镜像,默认使用%20pingcap/br%3A%24{tikv_version}%20(%233471%2C%20%40namco1992)

看起来只能给operator升级了

理论上来说不应该啊,1.1.6的operator也应该支持4.0.8或者4.0.16,不应该只是支持4.0.7

能描述一下您的具体操作步骤么?
是用 tioperate 部署一个 v4.0.7 的 tidb 和 br,然后再升级 tidb 到 v5.4.0 的么?
还是直接部署的 v5.4.0 的 tidb,然后再部署的 br?

1.1.6,不需要部署br。 只需要部署operator和创建tidb实例,可以创建4.0.8/4.0.16,但是br总是用4.0.7的版本。

该主题在最后一个回复创建后60天后自动关闭。不再允许新的回复。