tidb5.4测试性能低，如何调优，求救

fangfuhaomin · 2022 年7 月 6 日 08:18

【 TiDB 使用环境】
系统均采用conts7
3台虚拟机，32C+32G
tiup安装5.4.1
测试使用sysbench
参考官方测试文档：https://docs.pingcap.com/zh/tidb/v5.4/benchmark-sysbench-v5.4.0-vs-v5.3.0
集群部署按照官方进行调优，参考：https://docs.pingcap.com/zh/tidb/v5.4/check-before-deployment
【概述】场景 + 问题概述
测试命令
sysbench /usr/local/share/sysbench/tests/include/oltp_legacy/oltp.lua
–threads=32
–time=120
–oltp-test-mode=complex
–report-interval=1
–db-driver=mysql
–mysql-db=test
–mysql-host=127.0.0.1
–mysql-port=4000
–mysql-user=root
–mysql-password=’’
run --tables=10 --table-size=1000000

【测试结果】
Running the test with following options:
Number of threads: 32
Report intermediate results every 1 second(s)
Initializing random number generator from current time

Initializing worker threads…

Threads started!

[ 1s ] thds: 32 tps: 416.98 qps: 8742.65 (r/w/o: 6179.07/1696.78/866.80) lat (ms,95%): 127.81 err/s: 1.99 reconn/s: 0.00
[ 2s ] thds: 32 tps: 466.10 qps: 9411.10 (r/w/o: 6590.47/1885.42/935.21) lat (ms,95%): 114.72 err/s: 4.00 reconn/s: 0.00
[ 3s ] thds: 32 tps: 456.06 qps: 9200.30 (r/w/o: 6464.91/1820.26/915.13) lat (ms,95%): 121.08 err/s: 1.00 reconn/s: 0.00
[ 4s ] thds: 32 tps: 453.04 qps: 8951.88 (r/w/o: 6248.62/1801.18/902.09) lat (ms,95%): 137.35 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 32 tps: 471.86 qps: 9554.10 (r/w/o: 6705.97/1899.42/948.71) lat (ms,95%): 112.67 err/s: 1.00 reconn/s: 0.00
[ 6s ] thds: 32 tps: 475.09 qps: 9506.72 (r/w/o: 6650.20/1905.34/951.17) lat (ms,95%): 108.68 err/s: 1.00 reconn/s: 0.00
[ 7s ] thds: 32 tps: 472.97 qps: 9412.31 (r/w/o: 6585.52/1879.86/946.93) lat (ms,95%): 112.67 err/s: 1.00 reconn/s: 0.00
[ 8s ] thds: 32 tps: 464.41 qps: 9388.10 (r/w/o: 6588.65/1869.63/929.82) lat (ms,95%): 118.92 err/s: 1.00 reconn/s: 0.00
[ 9s ] thds: 32 tps: 446.45 qps: 8890.98 (r/w/o: 6213.27/1782.80/894.90) lat (ms,95%): 139.85 err/s: 2.00 reconn/s: 0.00
[ 10s ] thds: 32 tps: 467.03 qps: 9340.66 (r/w/o: 6539.46/1866.13/935.07) lat (ms,95%): 118.92 err/s: 1.00 reconn/s: 0.00
SQL statistics:
queries performed:
read: 64932
write: 18522
other: 9262
total: 92716
transactions: 4624 (458.37 per sec.)
queries: 92716 (9190.89 per sec.)
ignored errors: 14 (1.39 per sec.)
reconnects: 0 (0.00 per sec.)

General statistics:
total time: 10.0853s
total number of events: 4624

Latency (ms):
min: 33.13
avg: 69.44
max: 342.81
95th percentile: 121.08
sum: 321088.75

Threads fairness:
events (avg/stddev): 144.5000/3.81
execution time (avg/stddev): 10.0340/0.02

【问题】
1、3个节点只有连接tidb的CPU使用30%左右，其他tidb cpu使用率个位数。
2、3个节点io使用率70-80%。
3、测试tps低，还有err错误。

hey-hoho · 2022 年7 月 6 日 10:43

sysbench和tidb集群跑在同一个节点上么

fangfuhaomin · 2022 年7 月 7 日 02:43

sysbench是另外一个节点测试，再同节点也测过，差别不大

tidb狂热爱好者 · 2022 年7 月 7 日 02:50

tidb有性能要求的和mysql不一样的地方在于对磁盘的要求比较高。mysql可以用hhd机械盘。tidb必须用ssd硬盘。您先测试一下ssd硬盘性能达到了读写500m每秒了没

tidb狂热爱好者 · 2022 年7 月 7 日 02:51

6)测试磁盘写能力——默认文件系统会写缓存，同文件系统决定何时同步至磁盘，写速度一般较快

time dd if=/dev/zero of=output.file bs=8k count=128000

测试磁盘读能力——默认文件系统会读缓存，读速度一般较快，如果缓存里没有，则也是直接读磁盘，但第2次之前就比较快

time dd if=output.file of=/dev/null bs=8k count=128000

fangfuhaomin · 2022 年7 月 7 日 03:12

使用的磁盘是存储ssd，读写新能还是可以的
[root@mdw03 ~]# time dd if=/dev/zero of=output.file bs=8k count=128000
128000+0 records in
128000+0 records out
1048576000 bytes (1.0 GB) copied, 0.998806 s, 1.0 GB/s

real 0m1.001s
user 0m0.037s
sys 0m0.963s

fangfuhaomin · 2022 年7 月 7 日 03:14

我查看了grafana，tikv-err中有这些值，不知道是不是影响了性能

ngvf · 2022 年7 月 7 日 03:32

先修改拓扑文件参数配置如下:

server_configs:
  pd:
    replication.enable-placement-rules: true
  tikv:
    server.grpc-concurrency: 8
    server.enable-request-batch: false
    storage.scheduler-worker-pool-size: 8
    raftstore.store-pool-size: 5
    raftstore.apply-pool-size: 5
    rocksdb.max-background-jobs: 12
    raftdb.max-background-jobs: 12
    rocksdb.defaultcf.compression-per-level: ["no","no","zstd","zstd","zstd","zstd","zstd"]
    raftdb.defaultcf.compression-per-level: ["no","no","zstd","zstd","zstd","zstd","zstd"]
    rocksdb.defaultcf.block-cache-size: 12GB
    raftdb.defaultcf.block-cache-size: 2GB
    rocksdb.writecf.block-cache-size: 6GB
    readpool.unified.min-thread-count: 8
    readpool.unified.max-thread-count: 16
    readpool.storage.normal-concurrency: 12
    raftdb.allow-concurrent-memtable-write: true
    pessimistic-txn.pipelined: true
  tidb:
    prepared-plan-cache.enabled: true
    tikv-client.max-batch-wait-time: 2000000

再优化sysbench的插入速度,也可以自己写一个插入数据的程序,我这里给你提供一个:
tidb_data_prepare-0.1 (5.3 MB)

fangfuhaomin · 2022 年7 月 7 日 05:08

这个修改yaml文件不支持在线改，我该怎么修改？

fangfuhaomin · 2022 年7 月 7 日 05:29

调整后reload，结果变动不大
【测试结果】

【更新后的配置】
[root@mdw04 tidb]# tiup cluster show-config tidb-test
tiup is checking updates for component cluster …
Starting component cluster: /root/.tiup/components/cluster/v1.10.1/tiup-cluster show-config tidb-test
global:
user: tidb
ssh_port: 22
ssh_type: builtin
deploy_dir: /tidb-deploy
data_dir: /tidb-data
os: linux
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
deploy_dir: /tidb-deploy/monitor-9100
data_dir: /tidb-data/monitor-9100
log_dir: /tidb-deploy/monitor-9100/log
server_configs:
tidb:
prepared-plan-cache.enabled: true
tikv-client.max-batch-wait-time: 2000000
tikv:
pessimistic-txn.pipelined: true
raftdb.allow-concurrent-memtable-write: true
raftdb.defaultcf.block-cache-size: 2GB
raftdb.defaultcf.compression-per-level:
- “no”
- “no”
- zstd
- zstd
- zstd
- zstd
- zstd
raftdb.max-background-jobs: 12
raftstore.apply-pool-size: 5
raftstore.store-pool-size: 5
readpool.storage.normal-concurrency: 12
readpool.unified.max-thread-count: 16
readpool.unified.min-thread-count: 8
rocksdb.defaultcf.block-cache-size: 12GB
rocksdb.defaultcf.compression-per-level:
- “no”
- “no”
- zstd
- zstd
- zstd
- zstd
- zstd
rocksdb.max-background-jobs: 12
rocksdb.writecf.block-cache-size: 6GB
server.enable-request-batch: false
server.grpc-concurrency: 8
storage.scheduler-worker-pool-size: 8
pd:
replication.enable-placement-rules: true
tiflash: {}
tiflash-learner: {}
pump: {}
drainer: {}
cdc: {}
grafana: {}
tidb_servers:

host: 10.33.0.21
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /tidb-deploy/tidb-4000
log_dir: /tidb-deploy/tidb-4000/log
arch: amd64
os: linux
host: 10.33.0.22
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /tidb-deploy/tidb-4000
log_dir: /tidb-deploy/tidb-4000/log
arch: amd64
os: linux
host: 10.33.0.23
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: /tidb-deploy/tidb-4000
log_dir: /tidb-deploy/tidb-4000/log
arch: amd64
os: linux
tikv_servers:
host: 10.33.0.21
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /tidb-deploy/tikv-20160
data_dir: /tidb-data/tikv-20160
log_dir: /tidb-deploy/tikv-20160/log
arch: amd64
os: linux
host: 10.33.0.22
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /tidb-deploy/tikv-20160
data_dir: /tidb-data/tikv-20160
log_dir: /tidb-deploy/tikv-20160/log
arch: amd64
os: linux
host: 10.33.0.23
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: /tidb-deploy/tikv-20160
data_dir: /tidb-data/tikv-20160
log_dir: /tidb-deploy/tikv-20160/log
arch: amd64
os: linux
tiflash_servers: []
pd_servers:
host: 10.33.0.21
ssh_port: 22
name: pd-10.33.0.21-2379
client_port: 2379
peer_port: 2380
deploy_dir: /tidb-deploy/pd-2379
data_dir: /tidb-data/pd-2379
log_dir: /tidb-deploy/pd-2379/log
arch: amd64
os: linux
host: 10.33.0.22
ssh_port: 22
name: pd-10.33.0.22-2379
client_port: 2379
peer_port: 2380
deploy_dir: /tidb-deploy/pd-2379
data_dir: /tidb-data/pd-2379
log_dir: /tidb-deploy/pd-2379/log
arch: amd64
os: linux
host: 10.33.0.23
ssh_port: 22
name: pd-10.33.0.23-2379
client_port: 2379
peer_port: 2380
deploy_dir: /tidb-deploy/pd-2379
data_dir: /tidb-data/pd-2379
log_dir: /tidb-deploy/pd-2379/log
arch: amd64
os: linux
monitoring_servers:
host: 10.33.0.21
ssh_port: 22
port: 9090
ng_port: 12020
deploy_dir: /tidb-deploy/prometheus-9090
data_dir: /tidb-data/prometheus-9090
log_dir: /tidb-deploy/prometheus-9090/log
external_alertmanagers: []
arch: amd64
os: linux
grafana_servers:
host: 10.33.0.21
ssh_port: 22
port: 3000
deploy_dir: /tidb-deploy/grafana-3000
arch: amd64
os: linux
username: admin
password: admin
anonymous_enable: false
root_url: “”
domain: “”

ngvf · 2022 年7 月 7 日 05:51

好的你先参考https://asktug.com/t/topic/693931 这个看能调优到你想要的效果不? 其实性能调优这个事情不是一两句话就能说清楚的,需要对TiDB架构,原理,监控指标都很熟悉才行,建议看一下以下课程:https://learn.pingcap.com/learner/course/120005
https://learn.pingcap.com/learner/course/540005
https://learn.pingcap.com/learner/course/570012
总结起来就是分析每个角色TiDB,PD,TiKV,到底是哪里慢了,修改对应参数, 数据库表是否设计得合理?等

fangfuhaomin · 2022 年7 月 7 日 06:57

我分析了下出现err是因为使用了官方的/usr/local/share/sysbench/tests/include/oltp_legacy/oltp.lua，脚本里指定的数据库为sbtest，修改后性能上升了，延时讲到50左右，但是还达不到要求，目前是降到20以下，目前分析原因是延时到导致tps升不上去，这方面有方法吗

system · 2022 年10 月 31 日 19:17

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。