4.0.0 BR备份失败,报错内容IO:Custom kind: Other, error: failed to put object Request ID

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
集群版本v4.0.0,BR版本v4.0.0

【概述】 场景 + 问题概述
使用与集群版本相同的BR工具进行物理备份,将集群备份到s3协议的存储里,备份过程中中断

【备份和数据迁移策略逻辑】
备份命令如下:
AWS_ACCESS_KEY_ID=xxxxxx AWS_SECRET_ACCESS_KEY=xxxxxx /*****/tidb/tool/tidb-toolkit-v4.0.0-linux-amd64/bin/br backup full --pd=“xx.xx.xx.xx:2379” --checksum=false --storage=“xxxxx” --ratelimit=60 --s3.region=“xxxxx” --send-credentials-to-tikv=true --log-file="/****/{clustername}_{timestamp}.log" --s3.endpoint=“xxxxxx”

【问题】 当前遇到的问题
未在官网查询到相关的报错信息,无法确定原因

【TiDB 版本】
v4.0.0
【附件】
备份报错日志如下:
……
[2021/07/16 04:31:01.930 +08:00] [ERROR] [push.go:111] [“backup occur unknown error”] [error=“Io(Custom { kind: Other, error: “failed to put object Request ID: None Body: <?xml version=\\\"1.0\\\" encoding=\\\"UTF-8\\\"?>\ InternalErrorWe encountered an internal error, please try again./tidb/{ClusterName}/202107160430/1_3253579_1589_610cef689c4b6f7522be5aab417531464ecb6c46aff7d4c295213ea9b9fee0a0_write.sst388af3aeboss/KDwgKQ0LHFMSDQATEQQDQTZ+OiwhNiENWUc=” })”] [stack=“github.com/pingcap/log.Error\ \t/go/pkg/mod/github.com/pingcap/log@v0.0.0-20200117041106-d28c14d3b1cd/global.go:42\ github.com/pingcap/br/pkg/backup.(*pushDown).pushBackup\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.0/go/src/github.com/pingcap/br/pkg/backup/push.go:111\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRange\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.0/go/src/github.com/pingcap/br/pkg/backup/client.go:475\ngithub.com/pingcap/br/pkg/backup.(*Client).BackupRanges.func2\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.0/go/src/github.com/pingcap/br/pkg/backup/client.go:381”]
[2021/07/16 04:31:01.930 +08:00] [INFO] [client.go:445] [“backup range finished”] [take=59.600345202s]
[2021/07/16 04:31:01.930 +08:00] [INFO] [client.go:373] [“Backup Ranges”] [take=1m0.101904845s]
[2021/07/16 04:31:01.931 +08:00] [INFO] [ddl.go:407] ["[ddl] DDL closed"] [ID=2235e29b-088f-4461-9821-662a8147d3ce] [“take time”=1.173022ms]
[2021/07/16 04:31:01.931 +08:00] [INFO] [ddl.go:301] ["[ddl] stop DDL"] [ID=2235e29b-088f-4461-9821-662a8147d3ce]
[2021/07/16 04:31:01.935 +08:00] [INFO] [domain.go:607] [“domain closed”] [“take time”=5.207706ms]
[2021/07/16 04:31:01.935 +08:00] [INFO] [collector.go:172] [“Full backup Failed summary : total backup ranges: 12, total success: 11, total failed: 1”] [“backup total regions”=35800] [unitName=“range start:7480000000000000cd5f720000000000000000 end:7480000000000000cd5f72ffffffffffffffff00”] [error="msg:“Io(Custom { kind: Other, error: \“failed to put object Request ID: None Body: <?xml version=\\\\\\\"1.0\\\\\\\" encoding=\\\\\\\"UTF-8\\\\\\\"?>\\ InternalErrorWe encountered an internal error, please try again./tidb/{ClusterName}/202107160430/1_3253579_1589_610cef689c4b6f7522be5aab417531464ecb6c46aff7d4c295213ea9b9fee0a0_write.sst388af3aeboss/KDwgKQ0LHFMSDQATEQQDQTZ+OiwhNiENWUc=\” })” "] [errorVerbose=“msg:“Io(Custom { kind: Other, error: \“failed to put object Request ID: None Body: <?xml version=\\\\\\\"1.0\\\\\\\" encoding=\\\\\\\"UTF-8\\\\\\\"?>\\ InternalErrorWe encountered an internal error, please try again./tidb/{ClusterName}/202107160430/1_3253579_1589_610cef689c4b6f7522be5aab417531464ecb6c46aff7d4c295213ea9b9fee0a0_write.sst388af3ae</RequestIdboss/KDwgKQ0LHFMSDQATEQQDQTZ+OiwhNiENWUc=\” })” \ngithub.com/pingcap/br/pkg/backup.*pushDown).pushBackup\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.0/go/src/github.com/pingcap/br/pkg/backup/push.go:113\ngithub.com/pingcap/br/pkg/backup(*Client).BackupRange\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.0/go/src/github.com/pingcap/br/pkg/backup/client.go:475\ github.com/pingcap/br/pkg/backup.(*Client).BackupRanges.func2\ \t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.0/go/src/github.com/pingcap/br/pkg/backup/client.go:381\ runtime.goexit\ \t/usr/local/go/src/runtime/asm_amd64.s:1357”]


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

1 个赞

看起来和这个问题差不多 https://github.com/pingcap/br/issues/1162
把最后的 / 去掉试试
image

s3.endpoint这个参数的值后面没有/
我这边有多个同版本集群的备份,只有这一个是失败的,参数都是一样的
这个集群备份之前有成功过

麻烦给一下完整的命令和日志吧,多谢。

br备份命令,使用的是支持S3协议的对象存储,部分内容用xx代替了
AWS_ACCESS_KEY_ID=xxxxxx AWS_SECRET_ACCESS_KEY=xxxxxx /tidb/tool/tidb-toolkit-v4.0.0-linux-amd64/bin/br backup full --pd=“xxx.xx.xx.xx:2379” --checksum=false --storage=“s3://tidb/tidb_backup/202107210430” --ratelimit=6
0 --s3.region=“xxxx” --send-credentials-to-tikv=true --log-file="/tidb_backup/log/tidb_backup/tidb_backup_202107210430.log" --s3.endpoint="http://[tidb_backup_202107210430.log|attachment]
完整日志附件中,部分值以xxx代替
(upload://7EkAZnP7hDnkPYPjZ1wjse6Durn.log) (1.4 MB) "

tidb_backup_202107210430.log (1.4 MB)

感谢,能否帮忙再反馈下 S3 存储的对应的 log ,以及具体使用的什么 S3 存储?多谢。

s3的日志看到有一条任务被限速了,我再检查检查:sob:

好的,查到问题也帮忙反馈下,多谢。

问题已定位,4.0.0的s3 client没有重试,4.0.1版本修复
文档连接:https://docs.pingcap.com/zh/tidb/v4.0/release-4.0.1
Bug修复中的TiKV——改善备份恢复文件操作的可靠性#7917

:+1: 工具类建议使用最新版本比较好。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。