br备份失败,报错 rpc error: code = Canceled desc

【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】6.5.0
【复现路径】1T数据量,br备份4小时后失败
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】

<-----------------------------------------------|…> 59.94%[2024/06/07 00:49:26.933 +08:00] [INFO] [collector.go:73] [“Full Backup failed summary”] [total-ranges=272] [ranges-succeed=0] [ranges-failed=272] [backup-total-ranges=679] [unit-name=“range start:7480000000000002bf5f69800000000000000100 end:7480000000000002bf5f698000000000000001fb”] [error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose=“rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20221031025758-80f0d8ca4d07/client.go:1927\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20221031025758-80f0d8ca4d07/client.go:1599\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:81\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:52\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:78\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:835\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:794\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594”] [unit-name=“range start:7480000000000003015f69800000000000000100 end:7480000000000003015f698000000000000001fb”] [error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose=“rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20221031025758-80f0d8ca4d07/client.go:1927\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20221031025758-80f0d8ca4d07/client.go:1599\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:81\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:52\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:78\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:835\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:794\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594”] [unit-name=“range start:7480000000000003075f69800000000000000100 end:7480000000000003075f698000000000000001fb”] [error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose=“rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).respForErr\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20221031025758-80f0d8ca4d07/client.go:1927\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20221031025758-80f0d8ca4d07/client.go:1599\ngithub.com/pingcap/tidb/br/pkg/conn/util.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/util/util.go:39\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:81\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:52\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:78\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:835\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:794\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:76\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594”] [unit-name=“range start:7480000000000003855f720000000000000000 end:7480000000000003855f72ffffffffffffffff00”] [error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose="rpc error: code = Canceled desc = context

1T 数据要备份4 个小时? 什么配置… :rofl:

网络稳定吗

信息太少了,看看这个
https://github.com/pingcap/tidb/issues/53712

备份操作可能因为超时而被取消。检查你的 BR 备份命令中是否设置了超时参数,或者系统是否有默认的超时设置。或者是网络问题可能导致连接中断,进而取消备份

现在br备份命令里还有超时参数吗,我好像没找到,或者系统有什么超时参数吗

虚拟机,测试环境不是生产环境

你是放后台跑的吗?感觉像是session超时退出了

是放后台跑的,有啥阻止超时的办法吗

我本来以为是没放后台,session 界面超市导致的,看起来不是,你可以把备份命令,完整的日志发出来看下

这个问题是怎么解决的呢