BR restore失败

tidb版本4.0.4, br版本4.0.4, backup没有问题,restore失败
log.rar (2.0 MB)

[2020/09/22 18:46:42.734 +08:00] [WARN] [backoff.go:92] ["unexcepted error, stop to retry"] [error="rpc error: code = Unavailable desc = transport is closing"] [errorVerbose="rpc error: code = Unavailable desc = transport is closing\
github.com/pingcap/errors.AddStack\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/errors.go:174\
github.com/pingcap/errors.Trace\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/juju_adaptor.go:15\
github.com/pingcap/br/pkg/restore.(*FileImporter).downloadSST\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:390\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1.1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:250\
github.com/pingcap/br/pkg/utils.WithRetry\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/retry.go:34\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:245\
github.com/pingcap/br/pkg/utils.WithRetry\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/retry.go:34\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:212\
github.com/pingcap/br/pkg/restore.(*Client).RestoreFiles.func2\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/client.go:667\
github.com/pingcap/br/pkg/utils.(*WorkerPool).Apply.func1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/worker.go:44\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"]

日志报错意思是 RPC 请求时间长, backoff 重试超时,主要实在下载 sst 文件超时,麻烦确认一下 BR 备份的文件是否完整,BR 备份的文件是否已经全部导出到相同的目录。

用br validate命令检查没有发现问题,所有tikv节点都有全部备份文件

BR 日志里面有报错,是没有找到 sst 文件,例如“1_36336_1579_53f2c1918e1e351806413f8c1b66531742305153c930c83c08d6c14213e37b28_write.sst,”

需要确认一下恢复的文件中,是否包含。

[2020/09/22 18:46:42.737 +08:00] [ERROR] [import.go:268] ["download file failed"] [file="{name=1_36336_1579_53f2c1918e1e351806413f8c1b66531742305153c930c83c08d6c14213e37b28_write.sst,CF=write,sha256=678117137b115ad2e
00a9da4040130fc561b267d85c6ef6f7bccdd4a877f0638,startKey=7480000000000009015f698000000000000001013230323030393039ff3135343933305f35ff3339373135393535ff5f31383035353339ff3433363900000000fb03800000000000000603800000000
0000000038000000004ccccca,endKey=7480000000000009015f698000000000000001013230323030393039ff3136303933395f35ff3339373135393535ff5f31383035353339ff3433363900000000fb038000000000000006038000000000000000038000000004d89d87,startVersion=0,endVersion=419632650803216385,totalKvs=774333,totalBytes=72012969,CRC64Xor=1562705409644359750}"] [region="{ID=30670,startKey=748000000000000cfff85f698000000000ff0000010132303230ff30393039ff313534ff3933305f35ff3339ff373135393535ff5fff31383035353339ffff3433363900000000fffb03800000000000ff0006038000000000ff0000000380000000ff04ccccca00000000fb,endKey=748000000000000cfff85f698000000000ff0000010132303230ff30393039ff313630ff3933395f35ff3339ff373135393535ff5fff31383035353339ffff3433363900000000fffb03800000000000ff0006038000000000ff0000000380000000ff04d89d8700000000fb,epoch=\"conf_ver:8 version:5795 \",peers=\"id:30671 store_id:1 ,id:30672 store_id:4 ,id:30673 store_id:12316 \"}"] [startKey=748000000000000cfff85f698000000000ff0000010132303230ff30393039ff313534ff3933305f35ff3339ff373135393535ff5fff31383035353339ffff3433363900000000fffb03800000000000ff0006038000000000ff0000000380000000ff04ccccca00000000fb] [endKey=748000000000000cfff85f698000000000ff0000010132303230ff30393039ff313630ff3933395f35ff3339ff373135393535ff5fff31383035353339ffff3433363900000000fffb03800000000000ff0006038000000000ff0000000380000000ff04d89d8700000000fb] [error="rpc error: code = Unavailable desc = transport is closing"] [errorVerbose="rpc error: code = Unavailable desc = transport is closing\
github.com/pingcap/errors.AddStack\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/errors.go:174\
github.com/pingcap/errors.Trace\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/juju_adaptor.go:15\
github.com/pingcap/br/pkg/restore.(*FileImporter).downloadSST\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:390\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1.1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:250\
github.com/pingcap/br/pkg/utils.WithRetry\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/retry.go:34\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:245\
github.com/pingcap/br/pkg/utils.WithRetry\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/retry.go:34\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:212\
github.com/pingcap/br/pkg/restore.(*Client).RestoreFiles.func2\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/client.go:667\
github.com/pingcap/br/pkg/utils.(*WorkerPool).Apply.func1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/worker.go:44\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"] [stack="github.com/pingcap/log.Error\
\t/go/pkg/mod/github.com/pingcap/log@v0.0.0-20200511115504-543df19646ad/global.go:42\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import.func1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:268\
github.com/pingcap/br/pkg/utils.WithRetry\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/retry.go:34\
github.com/pingcap/br/pkg/restore.(*FileImporter).Import\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/import.go:212\
github.com/pingcap/br/pkg/restore.(*Client).RestoreFiles.func2\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/restore/client.go:667\
github.com/pingcap/br/pkg/utils.(*WorkerPool).Apply.func1\
\t/home/jenkins/agent/workspace/build_br_multi_branch_v4.0.4/go/src/github.com/pingcap/br/pkg/utils/worker.go:44"]

恢复文件中有包含的

可以确认一下么 ?通过 ls 方式查看对应报错 sst 文件绝对路径和备份的绝对路径是否匹配。。

绝对路径也是匹配的

/root/tidb-toolkit-v4.0.4-linux-amd64/bin/br restore db --pd ‘xxxx:xxxx’ -s “local:/srv/nodes/ssd/1/tidb/tidb_backup/tidbbak” --db qq_voip_lpt --ratelimit 2
ls /srv/nodes/ssd/1/tidb/tidb_backup/tidbbak/1_36336_1579_53f2c1918e1e351806413f8c1b66531742305153c930c83c08d6c14213e37b28_write.sst
/srv/nodes/ssd/1/tidb/tidb_backup/tidbbak/1_36336_1579_53f2c1918e1e351806413f8c1b66531742305153c930c83c08d6c14213e37b28_write.sst

https://docs.pingcap.com/zh/tidb/stable/backup-and-restore-faq#恢复的时候报错-could-not-read-localdownload-sst-failed该如何处理

恢复的时候,报错 could not read local://…:download sst failed,该如何处理?

在恢复的时候,每个节点都必须能够访问到所有的备份文件(SST files),默认情况下,假如使用 local storage,备份文件会分散在各个节点中,此时是无法直接恢复的,必须将每个 TiKV 节点的备份文件拷贝到其它所有 TiKV 节点才能恢复。

建议在备份的时候挂载一块 NFS 网盘作为备份盘,详见将单表数据备份到网络盘。

是内存给的太小了,扩容之后没问题了,感谢:grinning:

可以分享一下,如何定位的是内存问题么?

恢复一些小表都没问题,恢复一张八千万的表就失败了,然后发现tikv有重启,messages里有oom,所以定位到内存有问题

:+1::+1::+1:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。