BR恢复问题

【 TiDB 使用环境】新集群
【 TiDB 版本】V6.5.2
【复现路径】
现有两套tidb集群环境,均为内网环境,版本v6.5.2,操作如下:
老集群:备份
挂载NFS磁盘,老集群BR所在机器及所有TIKV节点新建文件目录/br,均挂载NFS,
mount -t nfs xxx.xxx.xxx.148:/data /br
在上述/data目录下新建br_back目录,已使用各节点做读写操作,均通过。
root用户执行备份语句如下(br所在目录):
./br backup full -pd “xx.xx.xx.39:2379” -storage “local:///br/br_backup” -ratelimit 128 -log-file brbackup.log
执行完成后,显示
Full Backup <----->100.00%,
Checksum <…>100.00%
检查日志,未发现错误。

恢复到新集群:
br所在节点(中控机),所有tikv节点,均新建目录 /data 挂载NFS目录
mount -t nfs xxx.xxx.xxx.xxx:/data /data
测试读写没有问题,查看/br_backup文件夹及其下所有目录,在所有新集群节点授权文件夹root权限,
chown -R root:root /data/br_backup

https://docs.pingcap.com/zh/tidb/stable/backup-and-restore-faq#遇到-permission-denied-或者-no-such-file-or-directory-错误即使用-root-运行-br-命令行工具也无法解决该如何处理
文档,检查集群用户
ps aux | grep tikv-server
root
查询启动信息
tiup cluster list
user显示为tidb,但每次启动我使用的都是root用户,这里很奇怪。
root用户执行恢复语句(新集群br所在节点,中控机):
./br restore full -s local:///data/br_backup/ --pd xx.xx.xx.207:2379 --ratelimit 128 --log-file br-restore.log

【遇到的问题:问题现象及影响】
BR恢复启动后,进度条显示,大概10.00%左右,显示报错,
[2025/02/26 11:53:05.067 +08:00] [ERROR] [main.go:59] [“br failed”] [error=“No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed; No such file or directory (os error 2): [BR:KV:ErrKVDownloadFailed]download sst failed”] [errorVerbose=“the following errors occurred:\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594\n - [BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory (os error 2)\n github.com/pingcap/tidb/br/pkg/restore.(*FileImporter).downloadSST.func1\n \t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/restore/import.go:663\n golang.org/x/sync/errgroup.(*Group).Go.func1\n \t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\n runtime.goexit\n \t/usr/local/go/src/runtime/asm_amd64.s:1594”] [stack=“main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:59\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]
详细日志见附件
br-restore.log (15.2 MB)

各位大佬,上述的问题,我有几个疑惑:
第一:
[BR:KV:ErrKVDownloadFailed]download sst failed\n No such file or directory 这个说的应该是每个tikv节点的NFS挂载目录对不,那找不到的目录(No such file or directory)是指什么??
第二:
请注意两边挂载目录不一样,老集群备份使用的挂载点是/br 新集群挂载点是/data 这个自己创建的本地目录应该没什么关系吧?NFS目录不乱就行吧

tiflash上面挂在NFS了吗

看上去有可能是权限的问题:

可以看看这篇文章

tiflash没挂啊

你挂一个在备份恢复试试,印象里tifalsh是需要挂载的,恢复时候要用到

1 个赞

就是老集群如果有tiflsah数据,备份的时候会把tiflash的数据也备份进来么?

不会,但是好像需要有

新集群配置如下:

好的,我试下挂载一下试试

你是指 TiUP 所在机器用 root 执行的吧? 但是 br 实际恢复时候,是 TiKV 直接从对应存储拉数据的,所以用的是 TiKV 启动时候的用户,大概率就是 tidb 用户了。
你试下把目录和文件 owner 都改成 tidb 试下

每个tikv备份和恢复的目录需要tidb用户的读写权限

挂载了tiflash好了,但是恢复到33%左右,把其中一个TIFLASH节点直接干报废了。。。然后又报错了,
看tiflash日志,没有异常日志,唉…
坎坷的BR之旅…

1 个赞

最后的最后,缩容了那个有问题的TiFlash节点,tiup cluster clean 了以后,重新跑,恢复完成。额

1 个赞

看下是否有相关的权限,把权限搞下呢。

权限都是没问题的,root和tidb用户的权限,NFS文件夹的权限都设置过 报错原因就是TiFlash节点没挂载NFS磁盘

1 个赞

嗯,那需要挂载下。

太不容易了,哥们。