tikv性能测试遇到限流问题

  1. 测试写/读混合场景 (写/读 1:10) 每个value的长度为1k,数据量为 5亿个key
    当qps在3万/读30万qps的时候 触发限流

判断 应该是compaction-pending bytes 到达阈值

修改storage.flow-control.soft-pending-compaction-bytes-limit 无效最大只有192GB?
报错如下
[“self.rocksdb.defaultcf.soft-pending-compaction-bytes-limit is too large. Setting it to storage.flow-control.soft-pending-compaction-bytes-limit (206158430208)”] [thread_id=0x5]

多给截图吧 我看你comp有1.9g 应该遇到磁盘问题了 磁盘写不进去
太慢了

改的地方不对吧。这个报错的意思是 rocksdb.defaultcf.soft-pending-compaction-bytes-limit 比 storage.flow-control.soft-pending-compaction-bytes-limit 大,所以被忽略了

另外,限流可能是预期中的行为,原因是写的太快、或者硬盘太慢。调整阈值并不能解决问题。

1 个赞

一般问题都在硬盘身上

对磁盘使用sysbench进行随机读写验证下磁盘性能?

我给你看一下有些磁盘性能极差的案例

image

3010000/10248
计算出来磁盘的速度已经达到了2343m
而且你还要3万写的qps。
1k的性能写入其实已经到你磁盘瓶颈。
建议你2方面提升
1.磁盘选择pcie5系列的盘。哪个盘有15g的写入
2.网络上看是否有瓶颈。万兆网卡改成100g网卡 交换机也要换。
如果是云上的系统。建议你更换裸金属机器。和厂商确定好机器性能。

1k*1024正好1m
磁盘读写得乘8

随机读性能
sudo fio --ioengine=libaio --randrepeat=0 --stonewall --norandommap=1 --thread --time_based --direct=1 --name=randread --rw=randread --bs=4k --size=10g --numjobs=4 --iodepth=128 --group_reporting --filename=/data/datatest1 --percentile_list=1:5:10:25:50:75:90:95:95.5:96:96.5:97:97.5:98:98.5:99:99.5:99.9:99.99:99.999
fio: time_based requires a runtime/timeout setting
randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128

fio-3.27
Starting 4 threads
Jobs: 1 (f=1): [(1),r(1),(2)][100.0%][r=1950MiB/s][r=499k IOPS][eta 00m:00s]
randread: (groupid=0, jobs=4): err= 0: pid=246669: Tue Jul 2 11:23:51 2024
read: IOPS=500k, BW=1952MiB/s (2047MB/s)(40.0GiB/20979msec)
slat (nsec): min=1034, max=5138.6k, avg=2047.18, stdev=20896.50
clat (usec): min=120, max=29649, avg=1005.85, stdev=1035.96
lat (usec): min=122, max=29651, avg=1007.98, stdev=1036.20
clat percentiles (usec):
| 1.000th=[ 265], 5.000th=[ 371], 10.000th=[ 457], 25.000th=[ 545],
| 50.000th=[ 676], 75.000th=[ 848], 90.000th=[ 2966], 95.000th=[ 4015],
| 95.500th=[ 4047], 96.000th=[ 4080], 96.500th=[ 4113], 97.000th=[ 4146],
| 97.500th=[ 4178], 98.000th=[ 4228], 98.500th=[ 4228], 99.000th=[ 4293],
| 99.500th=[ 4424], 99.900th=[ 4621], 99.990th=[ 5276], 99.999th=[28967]
bw ( MiB/s): min= 1485, max= 2490, per=100.00%, avg=1960.49, stdev=52.86, samples=163
iops : min=380162, max=637506, avg=501885.64, stdev=13533.09, samples=163
lat (usec) : 250=0.78%, 500=15.02%, 750=46.37%, 1000=22.13%
lat (msec) : 2=5.61%, 4=4.68%, 10=5.40%, 50=0.01%
cpu : usr=9.35%, sys=23.99%, ctx=1765476, majf=0, minf=516
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=10485760,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
READ: bw=1952MiB/s (2047MB/s), 1952MiB/s-1952MiB/s (2047MB/s-2047MB/s), io=40.0GiB (42.9GB), run=20979-20979msec

Disk stats (read/write):
nvme0n1: ios=10446576/69, merge=0/4, ticks=10264661/35, in_queue=10264696, util=99.53%


随机写性能
sudo fio --percentile_list=1:5:10:25:50:75:90:95:95.5:96:96.5:97:97.5:98:98.5:99:99.5:99.9:99.99:99.999 --ioengine=libaio --direct=1 --thread --group_reporting --name=baseline --filename=/data/datatest1 --stonewall --size=100% --bs=4k --rw=randrw --rwmixread=0 --numjobs=4 --iodepth=128 --time_based --norandommap=1 --randrepeat=0
Jobs: 3 (f=3): [w(2),_(1),w(1)][97.4%][w=1095MiB/s][w=280k IOPS][eta 00m:01s]
baseline: (groupid=0, jobs=4): err= 0: pid=223330: Mon Jul 1 23:06:12 2024
write: IOPS=280k, BW=1094MiB/s (1147MB/s)(40.0GiB/37445msec); 0 zone resets
slat (nsec): min=1052, max=7477.6k, avg=2361.71, stdev=49843.70
clat (usec): min=14, max=11771, avg=1801.68, stdev=2446.56
lat (usec): min=20, max=11773, avg=1804.13, stdev=2446.93
clat percentiles (usec):
| 1.000th=[ 253], 5.000th=[ 355], 10.000th=[ 429], 25.000th=[ 545],
| 50.000th=[ 709], 75.000th=[ 922], 90.000th=[ 6980], 95.000th=[ 7111],
| 95.500th=[ 7177], 96.000th=[ 7177], 96.500th=[ 7177], 97.000th=[ 7242],
| 97.500th=[ 7242], 98.000th=[ 7308], 98.500th=[ 7308], 99.000th=[ 7373],
| 99.500th=[ 7504], 99.900th=[ 7832], 99.990th=[10552], 99.999th=[11338]
bw ( MiB/s): min= 747, max= 1476, per=100.00%, avg=1102.71, stdev=33.08, samples=293
iops : min=191372, max=377948, avg=282294.19, stdev=8468.37, samples=293
lat (usec) : 20=0.01%, 50=0.01%, 100=0.01%, 250=0.86%, 500=16.79%
lat (usec) : 750=37.51%, 1000=23.62%
lat (msec) : 2=3.08%, 4=0.10%, 10=18.01%, 20=0.02%
cpu : usr=5.29%, sys=13.61%, ctx=1156113, majf=0, minf=4
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=0,10485760,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
WRITE: bw=1094MiB/s (1147MB/s), 1094MiB/s-1094MiB/s (1147MB/s-1147MB/s), io=40.0GiB (42.9GB), run=37445-37445msec

Disk stats (read/write):
nvme0n1: ios=1/10472001, merge=0/6, ticks=1/18321315, in_queue=18321317, util=99.61%

30 和10000 是什么参数

1k,数据量为 5亿个key
当qps在3万/读30万qps的时候 触发限流
我是根据你写的数据算出来的
你看看

3010000/10248 是你算读的30万数据量是?

读取所需数据量应该30 * 10000 * 1000 大约300MB 你的8是rocksdb读放大的倍数?

8是磁盘io计算方式 一个字节是8bit

你换算成磁盘io需要乘8

我看你磁盘压测速度和我发的数据一样的

扩展一下tikv个数吧。

pending bytes都1T了。这就是压到极限了。

可以考虑试试titan这个组件
https://docs.pingcap.com/zh/tidb/v7.1/titan-overview#titan-介绍

硬盘厂老干这事情 容量按1000算 磁盘io按bit算 带宽也是按bit算的

磁盘IO需要重点关注