Qiuchi
(Ti D Ber T Hwl2t Uf)
2025 年2 月 28 日 05:46
1
【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】6.5.9
【复现路径】开启悲观事务,关闭自动提交,进行高并发插入更新
【遇到的问题:问题现象及影响】监控中In-memory pessimistic locking result大量full,请问该内存限制是否可调?
、
有猫万事足
2025 年2 月 28 日 06:41
2
# RFC: In-memory Pessimistic Locks
- RFC PR: https://github.com/tikv/rfcs/pull/77
- Tracking Issue: https://github.com/tikv/tikv/issues/11452
## Motivation
Pessimistic locks are replicated via the Raft protocol and then applied to the Lock CF in RocksDB. Now, TiKV implements an optimization called "pipelined pessimistic lock", which returns the result to the user immediately after proposing the locking request successfully.
"Pipelined pessimistic lock" saves most of the latency spent on Raft replication. This optimization is mainly based on the fact that it is still safe if some pessimistic locks are lost at the time of committing the transaction.
We can take it a step further. It is feasible to only keep pessimistic locks in the memory of the leader and not replicate the locks via Raft. With appropriate handlings on region changes, the failure rate of transactions will not increase compared to "pipelined pessimistic lock".
This change expects to reduce disk write bandwidth by 20% and reduce the latency of pessimistic locking by 50% according to preliminary tests on the TPC-C workload.
## Detailed design
Here is the general idea:
- Pessimistic locks are written into a region-level lock table.
此文件已被截断。 显示原始文件
这是设计文档
Memory limit
To simplify handlings in region changes, we don’t allow the total size of the pessimistic locks in a single region to be too large. The limit for each region is 512 KiB, matching the 1 MiB Raft message limit.
There should also be a global limit. The default size is the minimum of 1 GiB and 5% of the system memory.
It is easy for the lock writer to maintain the total size of pessimistic locks in a single region. If the memory limit is exceeded, we have to fall back to propose pessimistic locks.
大小不能调整,是固定的,单region 512k。整体内存限制在1g和系统内存5%之间取最小值。
另外对于所有的写冲突问题,我还是觉得应该做整体架构上的设计,如果写冲突不是程序bug,从业务逻辑看不可避免,那么应该使用zookeeper/redis这类系统,把分布式锁放在数据库外面,这样你写数据就一定不会冲突,从而避免消耗db系统——这种核心系统的资源。
Qiuchi
(Ti D Ber T Hwl2t Uf)
2025 年2 月 28 日 07:01
3
我的这个场景不存在锁冲突,其实按理说应该是用乐观事务的,但因为历史原因我们所有业务都跑在悲观事务上,数据更新没有那么细致的调度。个人比较认同你的观点,我后面再测下乐观事务
1 个赞
有猫万事足
2025 年2 月 28 日 07:10
4
另外单region的内存是固定的,也可以想办法提升整体内存用量。也就是想办法让写入分布在尽可能多的region上。这其实就是打散写入热点了。
如果一定要在提高 in-memory-pessimistic-locks这个机制的内存用量上做文章,那么在不能配置的情况下,这可能是唯一的办法了。
1 个赞