【 TiDB 使用环境】生产环境
【 TiDB 版本】未使用TiKV,只使用TiKV做KV存储, 使用TxnClient
【遇到的问题】
UnsafeDestroyRange有一定概率删不掉数据,测试时发现有一定的机率(6%左右)被DestoryRange的数据居然能够被txn.Iter扫描出来,这是为何呢? 我简单翻阅过TiKV的代码,目前其实现是先调用Rocksdb::DeleteFilesInRange尝试快速释放空间,如果无效,则遍历逐个写入tombstone,但是不明白为何会出现数据残留。此外,逐个Delete能够换成Rocksdb::DeletRange写入RangeTombstone呢?因为感觉逐个遍历并Delete是不是开销有点大,GCWorker的Queue很容易被打满。
对一个非常大的Range(TB级别以上)做UnsafeDestroyRange会导致TiKV的IO被打满,时间持续数个小时以上, 这个有什么好的解决方案吗?
希望有官方同学答疑解惑,十分感谢!
jiyf
(Jiyf)
2022 年9 月 7 日 11:32
2
之前看 tidb gc 代码时候也有为啥需要两次调用 UnsafeDestroyRange 的疑问:
1. 第一次是按照 safepoint 执行 UnsafeDestroyRange
2. 第二次是在 safepoint 24 小时之后
# Unsafe Destroy Range
## Summary
Support RPC `UnsafeDestroyRange`. This call is on the whole TiKV rather than a
certain Region. When it is invoked, TiKV will use `delete_files_in_range` to
quickly free a large amount of space, and then scan and delete all remaining
keys in the range. Raft layer will be bypassed, and the range can cover
multiple Regions. The invoker should promise that after invoking
`UnsafeDestroyRange`, the range will **never** be accessed again. That is to
say, the range is permanently scrapped.
This interface is only designed for TiDB. It's used to clean up data after
dropping/truncating a huge table/index.
## Motivation
Currently, when TiDB drops/truncates a table/index, after GC lifetime it will
invoke `DeleteRange` on all affected Regions respectively. Though TiKV will
call `delete_files_in_range` to quickly free up the space for each Region, the
该文件已被截断。 显示原文
这个 rfc 有解释为什么第一次不能完全清理完成:
And why do we need to check it one more time after 24 hours? After deleting the range the first time, if coincidentally PD is trying to move a Region or something, some data may still appear in the range. So check it one more time after a proper time to greatly reduce the possibility.
IO 打满可能是由 compaction 导致:
After deleting all files in a very-large range, it may trigger RocksDB’s compaction (even it is not really needed) and even cause stalling.
system
(system)
关闭
2022 年11 月 18 日 02:45
4
此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。