BR restore 数据恢复报错 checksum时候报错空间不足

【 TiDB 使用环境】生产
【 TiDB 版本】 5.1.4
备份日志如下:

[2022/06/14 00:43:32.313 +08:00] [INFO] [collector.go:67] ["Full backup success summary"] [total-ranges=194082] [ranges-succeed=194082] [ranges-failed=0] [backup-checksum=2h26m16.859879769s] [backup-fast-checksum=397.738062ms] [backup-total-ranges=5842] [total-take=7h13m13.532734189s] [BackupTS=433877842372329483] [total-kv=10489753445] [total-kv-size=9.607TB] [average-speed=369.6MB/s] ["backup data size(after compressed)"=3.165TB]

我的目标实例空间为10T,但是爆了如下错误:

Error: Cannot read https://xxx/xxx-tidb-s3/200036/20220613/20220613173001/946723_5383469_12531_8d70ad2b732ec8342d75592faf84fd60086615b70caba39f48ab1fee0ee43ba1_1655115172579_write.sst into /home/tidb/data/import/.temp/3cdb4ce9-a932-4faa-949c-ddfd4604792f_782133_17_42321_write.sst: No space left on device (os error 28): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine("IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/37ad790c-6584-4b85-a430-79fb46802684_782133_17_42321_write.sst: No space left on device"): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine("IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/34a54e0e-168c-4fd6-823b-69000e95f8c4_782133_17_42321_write.sst: No space left on device"): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine("IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/21172232-9c8a-4a70-84a3-ec9b27927808_782133_17_42321_write.sst: No space left on device"): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine("IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/1982388f-4963-4afc-807d-6f7c99eb829b_782133_17_42321_write.sst: No space left on device"): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine("IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/3f230b6a-49a6-4c30-8add-fe7bf9b8814a_782133_17_42321_write.sst: No space left on device"): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine("IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/9c04e2f9-e2ba-446f-9f4e-13a4549895b7_782133_17_42321_write.sst: No space left on device"): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine("IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/966b76ad-01b4-4f1e-b1c1-26320288d72a_782133_17_42321_write.sst: No space left on device"): [BR:KV:ErrKVDownloadFailed]download sst failed

大佬 日志如下:

...
 79.68%Full restore <---------------------------------------------------------------...............> 79.68%Full restore <--------------------------------------------------------------\...............> 79.68%Full restore <--------------------------------------------------------------|...............> 79.68%Full restore <--------------------------------------------------------------/...............> 79.68%Full restore <---------------------------------------------------------------...............> 79.68%Full restore <--------------------------------------------------------------\...............> 79.68%Full restore <--------------------------------------------------------------|...............> 79.68%Full restore <--------------------------------------------------------------/...............> 79.68%Full restore <---------------------------------------------------------------...............> 79.68%Full restore <-----------------------------------------------------------------------------> 100.00%[2022/06/14 04:03:00.076 +08:00] [INFO] [collector.go:66] ["Full restore failed summary"] [total-ranges=151252] [ranges-succeed=151251] [ranges-failed=1] [split-region=3h15m46.540126349s] [restore-checksum=5h24m55.116353053s] [restore-ranges=160555] [unit-name=file] [error="Cannot read https://xxx/xxx-tidb-s3/200036/20220613/20220613173001/946723_5383469_12531_8d70ad2b732ec8342d75592faf84fd60086615b70caba39f48ab1fee0ee43ba1_1655115172579_write.sst into /home/tidb/data/import/.temp/3cdb4ce9-a932-4faa-949c-ddfd4604792f_782133_17_42321_write.sst: No space left on device (os error 28): [BR:KV:ErrKVDownloadFailed]download sst failed; Engine Engine(\"IO error: No space left on deviceWhile appending to file: /home/tidb/data/import/37ad790c-6584-4b85-a430-79fb46802684_782133_17_42321_write.sst: No space left on device\"): [BR:KV:ErrKVDownloadFailed]download sst failed;

另外在br的代码中,发现最后total-kv-size=9.607TB输出是一个humansize
这个humansize TB是1M=1000Byte 依次类推,中间并不是1024的进制关系
请问这里我们统计真实的size 是根据1000来做进制转换还是1024呢?

这个应该是tikv的磁盘空间不够了,需要扩容

我明白是空间不够了,但是集群空间明显比日志里9.6T要大的,还是出错了。所以想知道,日志里的size应该怎样处理呢?

我更新了问题描述,大佬协助给看下?

备份文件大概多大,数据盘还剩下多少空间

备份文件大小是日志中的total-kv-size=9.607TB 集群空间11T

需要稍微注意一下的是,Total KV size 是未经压缩的单副本的 KV 大小之和(或者说,就是把所有 KV 都给 select 出来,然后求和 len(key) + len(value) 的结果),这个数字大概率无法反映集群真正的大小。
因为 region 分布不一定绝对均匀的关系,在恢复的时候假如空间比较极限,确实有可能出现这样的状况,具体可以通过检查一下 region 分布、看看是不是有某台机器磁盘占用特别高来确定。

3 个赞

:+1:点赞

该主题在最后一个回复创建后60天后自动关闭。不再允许新的回复。