K8S BR报错 BR:Stream:ErrStreamLogTaskExist

【 TiDB 使用环境】
测试
【 TiDB 版本】
v6.5.0
【复现路径】
按照官方文档部署Backup Schedule资源,Backup Schedule包含一个快照备份和日志备份
【遇到的问题:问题现象及影响】
快照备份正常运行,但日志备份失败,根据pod的日志来看是已存在一个正在运行的日志备份Job,但是使用kubectl get bk -n tidb-dev后打印出的只有当前的任务(快照、日志)
【附件:截图/日志/监控】
Log-backup Pod 报错日志

I0627 09:39:57.519515       9 backup.go:262] [2023/06/27 09:39:57.519 +00:00] [ERROR] [main.go:59] ["br failed"] [error="It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists"] [errorVerbose="[BR:Stream:ErrStreamLogTaskExist]stream task already exists\nIt supports single stream log task currently\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamStart\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:550\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:506\nmain.streamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:231\nmain.newStreamStartCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:70\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"] [stack="main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:59\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
I0627 09:39:57.536465       9 backup.go:262] 
I0627 09:39:57.536481       9 backup.go:269] Error: It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists
E0627 09:39:57.536558       9 manager.go:479] Start log backup of cluster tidb-dev/log-kube-tidb-backup failed, err: cluster tidb-dev/log-kube-tidb-backup, wait pipe message failed, errMsg [2023/06/27 09:39:57.518 +00:00] [ERROR] [stream.go:507] ["failed to stream"] [command="log start"] [error="It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists"] [errorVerbose="[BR:Stream:ErrStreamLogTaskExist]stream task already exists\nIt supports single stream log task currently\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamStart\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:550\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:506\nmain.streamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:231\nmain.newStreamStartCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:70\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"] [stack="github.com/pingcap/tidb/br/pkg/task.RunStreamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:507\nmain.streamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:231\nmain.newStreamStartCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:70\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
[2023/06/27 09:39:57.519 +00:00] [ERROR] [main.go:59] ["br failed"] [error="It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists"] [errorVerbose="[BR:Stream:ErrStreamLogTaskExist]stream task already exists\nIt supports single stream log task currently\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamStart\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:550\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:506\nmain.streamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:231\nmain.newStreamStartCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:70\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"] [stack="main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:59\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
Error: It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists
, err: exit status 1
I0627 09:39:57.548654       9 backup_status_updater.go:123] Backup: [tidb-dev/log-tidb-backup] updated successfully
error: cluster tidb-dev/log-tidb-backup, wait pipe message failed, errMsg [2023/06/27 09:39:57.518 +00:00] [ERROR] [stream.go:507] ["failed to stream"] [command="log start"] [error="It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists"] [errorVerbose="[BR:Stream:ErrStreamLogTaskExist]stream task already exists\nIt supports single stream log task currently\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamStart\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:550\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:506\nmain.streamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:231\nmain.newStreamStartCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:70\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"] [stack="github.com/pingcap/tidb/br/pkg/task.RunStreamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:507\nmain.streamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:231\nmain.newStreamStartCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:70\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
[2023/06/27 09:39:57.519 +00:00] [ERROR] [main.go:59] ["br failed"] [error="It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists"] [errorVerbose="[BR:Stream:ErrStreamLogTaskExist]stream task already exists\nIt supports single stream log task currently\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamStart\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:550\ngithub.com/pingcap/tidb/br/pkg/task.RunStreamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/task/stream.go:506\nmain.streamCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:231\nmain.newStreamStartCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/stream.go:70\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"] [stack="main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:59\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
Error: It supports single stream log task currently: [BR:Stream:ErrStreamLogTaskExist]stream task already exists
, err: exit status 1

排查存在的Backup Template

kubectl get bk -n tidb-dev
NAME                              TYPE   MODE       STATUS     BACKUPPATH                                                                           BACKUPSIZE   COMMITTS                            LOGTRUNCATEUNTIL   AGE
log-tidb-backup                          log        Failed     s3://dev-us-east-1/tidb-log-backup/log-2023-06-27t09-39-34                                                                                               20m
tidb-backup-2023-06-27t09-40-00   full   snapshot   Complete   s3://dev-us-east-1/tidb-full-backup/tidb-perf-pd.tidb-dev-2379-2023-06-27t09-40-00   18 GB        442462065254989825                      19m
tidb-backup-2023-06-27t09-50-00   full   snapshot   Complete   s3://dev-us-east-1/tidb-full-backup/tidb-perf-pd.tidb-dev-2379-2023-06-27t09-50-00   18 GB        442462222279245825                      9m51s

根据您的描述,可能是由于您的日志备份Job没有被正确删除,导致新的日志备份Job无法正常运行。您可以通过以下步骤来解决该问题:

  1. 使用kubectl get job -n <namespace>命令查看当前命名空间下的所有Job,找到与日志备份相关的Job。
  2. 如果存在正在运行的日志备份Job,可以使用kubectl delete job <job-name> -n <namespace>命令手动删除该Job。
  3. 如果删除Job后仍然无法正常运行日志备份,可以尝试删除与日志备份相关的CronJob和BackupSchedule资源,并重新创建这些资源。

在执行上述步骤之前,建议您先备份相关的数据,以免误操作导致数据丢失。另外,如果您仍然无法解决该问题,建议您提供更多的日志信息和相关配置信息,以便我们更好地帮助您解决问题。

参考 https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br#停止日志备份 先将当前 log backup 停掉

删除job日志

我这边的日志备份应该是没有开始就直接失败了,所以停止后还是失败,主要是我现在找不到上述日志中提到的已经存在的日志备份job,使用如下指令查找也只有已经失败的日志备份任务,没有其他的日志备份任务了

kubectl get backup -n tidb-dev
NAME                                   TYPE   MODE       STATUS     BACKUPPATH                                                                                                   BACKUPSIZE   COMMITTS             LOGTRUNCATEUNTIL   AGE
log-kube-tidb-backup                          log        Failed     s3://dev-us-east-1/tidb-log-backup/log-2023-06-27t09-39-34                                                                                               17h

我查看controller-manager中的日志如下

E0628 02:54:25.593199       1 backup_manager.go:105] backup tidb-dev/log-kube-tidb-backup wait pre task done error log backup tidb-dev/log-kube-tidb-backup command log-stop should wait log backup start complete.

看起来确实是还没开始就失败了,因此也无法stop

log backup 启动时带了 name https://github.com/pingcap/tidb-operator/blob/master/cmd/backup-manager/app/backup/backup.go#L154, 如果还记得 name 的话直接用相同 name 起一个 logStop: true 的停止任务, 如果不记得 name 的话 :rofl:

要不试一下裸跑个 br 来查一下吧

或者搜搜 tikv log 看看有没有相关的信息 :rofl:

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。