作者 | Wentao Jin
一、BenchmarkSQL 介绍
TPC-C Benchmark是一个对OLTP(联机交易处理)系统进行测试的规范,使用一个商品销售模型对OLTP系统进行测试,其中包含五类事务:
-
NewOrder –新订单的生成
-
Payment –订单付款
-
OrderStatus –最近订单查询
-
Delivery –配送
-
StockLevel –库存缺货状态分析
在测试开始前,TPC-C Benchmark规定了数据库的初始状态,也就是数据库中数据生成的规则,其中ITEM表中固定包含10万种商品,仓库的数量可进行调整,假设WAREHOUSE表中有W条记录,那么:
-
STOCK表中应有W×10万条记录(每个仓库对应10万种商品的库存数据);
-
DISTRICT表中应有W×10条记录(每个仓库为10个地区提供服务);
-
CUSTOMER表中应有W×10×3000条记录(每个地区有3000个客户);
-
HISTORY表中应有W×10×3000条记录(每个客户一条交易历史);
-
ORDER表中应有W×10×3000条记录(每个地区3000个订单),并且最后生成的900个订单被添加到NEW-ORDER表中,每个订单随机生成5~15条ORDER-LINE记录。
每个 Warehouse 的数据量,其大小约为 76823.04KB,计算公式为:数据总量(KB)≈ Warehouse个数*76823.04KB
TPC-C使用tpmC值(Transactions per Minute)来衡量系统最大有效吞吐量(MQTh,Max Qualified Throughput),其中Transactions以NewOrder Transaction为准,即最终衡量单位为每分钟处理的新订单数。
二、TiDB 部署
2.1、官方部署建议
参考 TiDB 部署文档 部署 TiDB 集群。在 3 台服务器的条件下,建议每台机器部署 1 个 TiDB, 1 个 PD, 1 个 TiKV 实例。关于磁盘,以 32 张表、每张表 10M 行数据为例,建议 TiKV 的数据目录所在的磁盘空间大于 512 GB。
对于单个 TiDB 的并发连接数,建议控制在 500 以内,如需增加整个系统的并发压力,可以增加 TiDB 实例,具体增加的 TiDB 个数视测试压力而定。
对于写密集型测试,比如 Sysbench 的 oltp_update_index、oltp_update_non_index、oltp_write_only 或者 TPC-C 测试,建议增加 TiKV 实例数量,如每台机器 2 个 TiKV。
对于机器硬件要求,请参考文档,表格内为写密集测试的部署建议
TiDB | TiKV | PD | |
---|---|---|---|
node1 | 1 | 1(2) | 1 |
node2 | 1 | 1(2) | 1 |
node3 | 1 | 1(2) | 1 |
2.2、珠海工行部署
机器硬件配置: 3台机器
CPU 2*14*2
内存 256G
存储 800G SSD*4
网卡 万兆网卡
TiDB 集群部署:6 * TiDB、12 * TiKV、1 * PD
TiDB | TiKV | PD | |
---|---|---|---|
node1 | 2 | 4 | 1 |
node2 | 2 | 4 | 0 |
node3 | 2 | 4 | 0 |
三、TiDB 集群调优
3.1 TiDB 调优
升高日志级别,可以减少打印日志数量,对性能有积极影响。具体在 TiDB 配置文件中加入:
[log]
level = "error"
performance:
# Max CPUs to use, 0 use number of CPUs in the machine.
--根据机器的CPU核数设置
max-procs: 28
prepared_plan_cache:
--开启 TiDB 配置中的 prepared plan cache,以减少优化执行计划的开销
enabled: true
tikv_client:
# Max gRPC connections that will be established with each tikv-server.
--当tidb、tikv 负载比较低或者CPU空闲时,可适当调大该值,加大负载
grpc-connection-count: 4
txn_local_latches:
# Enable local latches for transactions. Enable it when
# there are lots of conflicts between transactions.
--并发压测建议开启,可减少事务的冲突
enabled: true
3.2 TiKV 调优
升高 TiKV 的日志级别同样有利于性能表现。由于 TiKV 是以集群形式部署,在 Raft 算法的作用下,能保证大多数节点已经写入数据。因此,除了对数据安全极端敏感的场景之外,raftstore 中的 sync-log 选项可以关闭。
TiKV 集群存在两个 Column Family(Default CF 和 Write CF)主要用于存储不同类型的数据,比例是:
Default CF : Write CF = 4 : 1
或者
Default CF : Write CF = 10: 1
数据多,索引少可采用10:1,数据与索引相差不大可采用4:1
在 TiKV 中需要根据机器内存大小配置 RocksDB 的 block cache,以充分利用内存。以 20 GB 内存的虚拟机部署一个 TiKV 为例,其 block cache 建议配置如下:
global:
log-level = "error"
[raftstore]
sync-log = false
[rocksdb.defaultcf]
block-cache-size = "12GB"
[rocksdb.writecf]
block-cache-size = "3GB"
TiKV 其他参数值调整说明值
通过 Grafana 的 TiKV Thread CPU 监控面板可以观察到其实际使用率。
- 如出现单线程模块瓶颈,可以通过扩展 TiKV 节点来进行负载均摊;
- 如出现多线程模块瓶颈,可以通过增加该模块并发度进行调整
-- 根据监控cluster面板中QPS主要用于点查kv_get还是coprocessor 调整对应的storage或者coprocessor
readpool:
-- 主要用于点查kv get api请求
storage:
high-concurrency: 6
normal-concurrency: 6
low-concurrency: 6
-- 根据监控coprocessor繁忙程度调整,主要用于查询下推coprocessor tikv
coprocessor:
high-concurrency: 8
normal-concurrency: 8
low-concurrency: 8
server:
## Size of the thread pool for the gRPC server.
-- 该参数主要用于处理tidb grpc连接调度处理,当 tidb、tikv 比较空闲,在 tidb 调整grpc-connection-count 连接数时,可适当调整该值
grpc-concurrency: 4
storage:
--调度程序的工作池大小,即写入线程的数量。应该小于总CPU核心数。当有频繁的写操作时,将其设置为更高的价值。更具体地说,你可以运行`top -H -p tikv-pid`来检查是否有线程`sched-worker-pool`很忙
scheduler-worker-pool-size: 4
raftstore:
--在 Raft 算法的作用下,能保证大多数节点已经写入数据。因此,除了对数据安全极端敏感的场景之外,raftstore 中的 sync-log 选项可以关闭,压测过程一般关闭
sync-log: false
## Use how many threads to handle log apply
-- TiDB 3.0 版本前 raftstore是个单线程,apply log容易出现瓶颈,之前通过增加 TiKV 节点处理,现在可以调整
apply-pool-size: 2
## Use how many threads to handle raft messages
-- TiDB 3.0 版本前 raftstore是个单线程,append log容易出现瓶颈,之前通过增加 TiKV 节点处理,现在可以调整
store-pool-size: 2
更详细的 TiKV 参数调优请参考TiKV 性能参数调优 。
四、测试准备工作
4.1、BenchmarkSQL 配置
安装 jdk 至少 1.8以上,下载链接
安装 ant
# yum install ant
因为 BenchmarlSQL 本身是不支持 Mysql ,所以需要更改源代码以支持 Mysql, BenchmarkSQL 官方下载链接
git clone -b 5.0-mysql-support-opt https://github.com/jackysp/benchmarksql.git
编译 benchmarkSQL 源码
$ cd benchmarksql
[benchmarksql]$ ant
Buildfile: /data/tidb/wentaojin/benchmarksql/build.xml
init:
[mkdir] Created dir: /data/tidb/wentaojin/benchmarksql/build
compile:
[javac] Compiling 11 source files to /data/tidb/wentaojin/benchmarksql/build
dist:
[mkdir] Created dir: /data/tidb/wentaojin/benchmarksql/dist
[jar] Building jar: /data/tidb/wentaojin/benchmarksql/dist/BenchmarkSQL-5.0.jar
BUILD SUCCESSFUL
Total time: 1 second
报错如下,是因为未 ant 编译 benchmarkSQL 源码
$ ./runSQL.sh props.mysql sql.mysql/tableCreates.sql
# ------------------------------------------------------------
# Loading SQL file sql.mysql/tableCreates.sql
# ------------------------------------------------------------
Error: Could not find or load main class ExecJDBC
benchmarkSQL 参考使用README链接
$ cd run
The test process for MySQL/TiDB
# Create the tables
$ ./runSQL.sh props.mysql sql.mysql/tableCreates.sql
# Add the indices (TiDB prefers to import data after adding the indices)
$ ./runSQL.sh props.mysql sql.mysql/indexCreates.sql
# Import Data
$ ./runLoader.sh props.mysql
参数配置文件 run 目录下 sql.mysql 见如下:
db=mysql
driver=com.mysql.jdbc.Driver
conn=jdbc:mysql://localhost:4000/tpcc?useSSL=false&useServerPrepStmts=true&useConfigs=maxPerformance&sessionVariables=tidb_batch_commit=1
## 一般压测去掉tidb_batch_commit性能更好些,导入加载数据加上tidb_batch_commit导入更快些
#conn=jdbc:mysql://localhost:4000/tpcc?useSSL=false&useServerPrepStmts=true&useConfigs=maxPerformance
user=root
password=
# 指定生成仓库数
warehouses=1
# 导入并发线程数
loadWorkers=4
# 压测并发数
terminals=1
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
# 压测时间,默认10分钟
runMins=10
//Number of total transactions per minute
# 不限制事务,让其压测最高
limitTxnsPerMin=0
//Set to true to run in 4.x compatible mode. Set to false to use the
//entire configured database evenly.
terminalWarehouseFixed=true
//The following five values must add up to 100
//The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec
# 默认benchmarkSQL事务压测比例
newOrderWeight=45
paymentWeight=43
orderStatusWeight=4
deliveryWeight=4
stockLevelWeight=4
// Directory name to create for collecting detailed result data.
// Comment this out to suppress.
resultDirectory=my_result_%tY-%tm-%td_%tH%tM%tS
//osCollectorScript=./misc/os_collector_linux.py
//osCollectorInterval=1
//osCollectorSSHAddr=user@dbhost
//osCollectorDevices=net_eth0 blk_sda
4.2、统计信息收集
数据导入完成后,手动收集表统计信息,由于目前 TiDB 缺乏针对于库级别的收集,只能写脚本手动收集所有表统计信息
#!/bin/bash
host=122.71.221.112
port=4000
user=tidb
password='tidb'
dbname=tpch
### Get table name
tabName=`mysql -h ${host} -P ${port} -u${user} -p${password} -s -e "select table_name from information_schema.tables where table_schema='${dbname}'"
### Get table counts
#tabCounts=`mysql -h ${host} -P ${port} -u${user} -p${password} -e "select count(*) from information_schema.tables where table_schema='${dbname}' group by table_schema"`
### Convert table name array
declare -a tabArry
idx=0
for tab in ${tabName}
do
tabArry[idx]=${tab}
if [ $? -eq 0 ]; then
echo "table ${tab} add to tabArry success."
echo "------------------------------------"
echo "tabArry ${idx} table name: "
echo ""
echo ${tabArry[idx]}
echo ""
else
echo "table ${tab} add to tabArry failed."
echo ""
fi
let idx++
done
### Analyze table
for name in ${tabArry[*]}
do
mysql -h ${host} -P ${port} -u${user} -p${password} -D ${dbname} -e "analyze table ${dbname}.${name}"
if [ $? -eq 0 ]; then
echo "analyze table ${dbname}.${name} success."
else
echo "analyze table ${dbname}.${name} failed."
fi
done
4.3、数据库压测
首次压测,可调整参数配置文件runMins压测时间调低寻找最高临界点,然后针对临界点连续压测三次,每次持续10分钟即可
$ cd run/
# Run the benchmark
$ ./runBenchmark.sh props.mysql
4.4、一致性验证脚本
BenchmarkSQL 压测完之后,运行一致性验证脚本.(运行一次即可)
Check.sql
-- SET search_path TO TPCC;
-- Condition 1: W_YTD = sum(D_YTD)
SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
FROM bmsql_warehouse w,
(SELECT d_w_id, SUM(d_ytd) sum_d_ytd
FROM bmsql_district
GROUP BY d_w_id) d
WHERE w.w_id = d.d_w_id) as x
WHERE w_ytd != sum_d_ytd;
-- Condition 2: D_NEXT_O_ID - 1 = max(O_ID) = max(NO_O_ID)
SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
FROM bmsql_district d,
(SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
FROM bmsql_oorder
GROUP BY o_w_id, o_d_id) o,
(SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
FROM bmsql_new_order
GROUP BY no_w_id, no_d_id) no
WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;
-- Condition 3: max(NO_O_ID) - min(NO_O_ID) + 1
-- = [number of rows in the NEW-ORDER table for this bmsql_district]
SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
MIN(no_o_id) min_no_o_id, COUNT(*) count_no
FROM bmsql_new_order
GROUP BY no_w_id, no_d_Id) as x
WHERE max_no_o_id - min_no_o_id + 1 != count_no;
-- Condition 4: sum(O_OL_CNT)
-- = [number of rows in the ORDER-LINE table for this bmsql_district]
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
FROM bmsql_oorder
GROUP BY o_w_id, o_d_id) o,
(SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
FROM bmsql_order_line
GROUP BY ol_w_id, ol_d_id) ol
WHERE o.o_w_id = ol.ol_w_id AND
o.o_d_id = ol.ol_d_id) as x
WHERE sum_o_ol_cnt != count_ol;
-- Condition 5: For any row in the ORDER table, O_CARRIER_ID is set to a null
-- value if and only if there is a corresponding row in the
-- NEW-ORDER table
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
FROM bmsql_oorder o,
(SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
FROM bmsql_new_order
GROUP BY no_w_id, no_d_id, no_o_id) no
WHERE o.o_w_id = no.no_w_id AND
o.o_d_id = no.no_d_id AND
o.o_id = no.no_o_id) as x
WHERE (o_carrier_id IS NULL AND count_no = 0) OR
(o_carrier_id IS NOT NULL AND count_no != 0);
-- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
-- of rows in the ORDER-LINE table for the corresponding order
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
FROM bmsql_oorder o,
(SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
FROM bmsql_order_line
GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
WHERE o.o_w_id = ol.ol_w_id AND
o.o_d_id = ol.ol_d_id AND
o.o_id = ol.ol_o_id) as x
WHERE o_ol_cnt != count_ol;
-- Condition 7: For any row in the ORDER-LINE table, OL_DELIVERY_D is set to
-- a null date/time if and only if the corresponding row in the
-- ORDER table has O_CARRIER_ID set to a null value
SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
o.o_carrier_id
FROM bmsql_order_line ol,
bmsql_oorder o
WHERE ol.ol_w_id = o.o_w_id AND
ol.ol_d_id = o.o_d_id AND
ol.ol_o_id = o.o_id) as x
WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
(ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);
-- Condition 8: W_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
FROM bmsql_warehouse w,
(SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
WHERE w.w_id = h.h_w_id) as x
WHERE w_ytd != sum_h_amount;
-- Condition 9: D_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
FROM bmsql_district d,
(SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
FROM bmsql_history
GROUP BY h_w_id, h_d_id) h
WHERE d.d_w_id = h.h_w_id
AND d.d_id = h.h_d_id) as x
WHERE d_ytd != sum_h_amount;
4.5、BenchmarkSQL 测试脚本
#!/bin/bash
date=`date +'%Y-%m-%d_%H%M%S'`
### BechmarkSQL run dir,Can not with "/"
benchmarkDir=/data/tidb/wentaojin/benchmarksql/run
### BenchmarkSQL test script
benchmarkScript=/data/tidb/wentaojin/benchmarksql/run/runBenchmark.sh
### BechmarkSQL mysql configuration file name
benchmarkConfName=props.mysql
### Get benchmark runtime(Mins)
runStr=`grep 'runMins' ${benchmarkDir}/${benchmarkConfName}`
runTime=${runStr##*=}
### BechmarkSQL log
benchmarkLog=${benchmarkDir}/benchmark_${date}.log
if [ ! -f ${benchmarkLog} ]; then
touch ${benchmarkLog}
fi
###########################################
# Nmon Variable |
###########################################
# Whether to activate nmon, 1 show enabled Or 0 show disabeld
nmonState=1
# Confirm all machine nmon binary exist the same dir,Then set nmon program execute dir,Can not with "/"
nmonDir=/data/tidb/wentaojin
# How many seconds to set up to collect once (10s collect once)
# If you want Nmon run 10 mins,you can set 10s collect once and collect 66 times
timeInterval=10
# Set how many times to collect (collect 66 times)
collectTimes=13
### Set need run nmon process host IP in TiDB node
hostArry=(172.16.30.86 172.16.30.87 172.16.30.88 172.16.30.89)
### Enable Nmon processes
function enableNmonRun(){
echo "#--------------------------------------------------------------------------------"
echo " Start Enabled Nmon "
echo "#--------------------------------------------------------------------------------"
for ip in ${hostArry[*]}
do
# View remote machine whether existed nmon log dir and nmon binary whether exist
if `ssh ${ip} test -d ${nmonDir}`; then
if `ssh ${ip} test -x ${nmonDir}/nmon`; then
if `ssh ${ip} test -d ${nmonDir}/nmonLog/${date}`; then
ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
else
ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
fi
else
ssh ${ip} "chmod +x ${nmonDir}/nmon"
if `ssh ${ip} test -d ${nmonDir}/nmonLog/${date}`; then
ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
else
ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
fi
fi
else
ssh ${ip} "mkdir -p ${nmonDir}"
if `ssh ${ip} test -x ${nmonDir}/nmon`; then
if `ssh ${ip} test -d ${nmonDir}/nmonLog/${date}`; then
ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
else
ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
fi
else
scp ${nmonDir}/nmon ${ip}:${nmonDir}/
ssh ${ip} "chmod +x ${nmonDir}/nmon"
if `ssh ${ip} test -d ${nmonDir}/nmonLog/${date}`; then
ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
else
ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
fi
fi
fi
done
echo ""
if [ $? -eq 0 ]; then
echo "Enable nmon success."
else
echo "Enable nmon failed."
fi
}
### Start benchmarkSQL Test
function startBenchmarkSQLTest(){
nohup ${benchmarkScript} ${benchmarkDir}/${benchmarkConfName} > ${benchmarkLog} 2>&1 &
}
### Get Benchmarksql result
function getBenchmarkResult(){
echo "#--------------------------------------------------------------------------------"
echo " Start Gather Benchmark Results "
echo "#--------------------------------------------------------------------------------"
### Need to view the parameter values in the props.mysql configuration file
resultDir=`ls -ldrt ${benchmarkDir}/my_result_* | tail -1 | awk '{print $9}'`
if [ ${nmonState} -eq 1 ]; then
nmonDir=`ls -ldrt ${benchmarkDir}/nmonLog/* | tail -1 | awk '{print $9}'`
fi
rdir=${resultDir##*/}
echo ""
echo "Start display Original dir ${rdir} results."
echo "#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"
echo ""
echo "benchmark result name: ${resultDir}"
echo ""
echo "benchmark log output name: ${benchmarkLog}"
if [ ${nmonState} -eq 1 ]; then
echo ""
echo "nmon log dir name: ${nmonDir}"
fi
echo ""
c=`cat ${resultDir}/data/result.csv | awk -F "," '{print $1}' | uniq | tail -1`
echo "Concurrency: ${c}"
echo ""
cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$4}END{print "dblatencySum: "sum; print "dblatencyAvg: " sum/NR}'
echo ""
cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$3}END{print "latencySum: "sum; print "latencyAvg: " sum/NR}'
echo ""
nt=`grep 'Measured tpmC (NewOrders)' ${benchmarkLog} | awk '{print $11}'`
echo "Measured tpmC (NewOrders): ${nt}"
echo ""
nl=`grep 'Measured tpmTOTAL ' ${benchmarkLog} | awk '{print $10}'`
echo "Measured tpmTOTAL: ${nl}"
echo ""
nst=`grep 'Session Start' ${benchmarkLog} | awk '{print $10" "$11}'`
echo "tpc-c start run time: ${nst}"
echo ""
nse=`grep 'Session End' ${benchmarkLog} | awk '{print $10" "$11}'`
echo "tpc-c end run time: ${nse}"
echo ""
tc=`grep 'Transaction Count' ${benchmarkLog} | awk '{print $10}'`
echo "Transaction Count: ${tc}"
echo "#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"
echo ""
echo "Start redirecting the above sort out results to ${benchmarkDir}/results/result.result"
concurrency=`cat ${resultDir}/data/result.csv | awk -F "," '{print $1}' | uniq | tail -1`
dblatency=`cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$4}END{print "dblatencyAvg: " sum/NR}'`
latency=`cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$3}END{print "latencyAvg: " sum/NR}'`
tpmC=`grep 'Measured tpmC (NewOrders)' ${benchmarkLog} | awk '{print $11}'`
tpmcTotal=`grep 'Measured tpmTOTAL ' ${benchmarkLog} | awk '{print $10}'`
startTime=`grep 'Session Start' ${benchmarkLog} | awk '{print $10" "$11}'`
endTime=`grep 'Session End' ${benchmarkLog} | awk '{print $10" "$11}'`
tcounts=`grep 'Transaction Count' ${benchmarkLog} | awk '{print $10}'`
if [ ! -d "${benchmarkDir}/results" ]; then
mkdir -p ${benchmarkDir}/results
fi
echo "benchmarkResult dir name: ${resultDir}" >> ${benchmarkDir}/results/result.result
echo "benchmarkLog name: ${benchmarkLog}" >> ${benchmarkDir}/results/result.result
if [ ${nmonState} -eq 1 ]; then
echo "nmonLog dir name: ${nmonDir}" >> ${benchmarkDir}/results/result.result
fi
echo "concurrency: ${concurrency}" >> ${benchmarkDir}/results/result.result
echo "${dblatency}" >> ${benchmarkDir}/results/result.result
echo "${latency}" >> ${benchmarkDir}/results/result.result
echo "tpmC: ${tpmC}" >> ${benchmarkDir}/results/result.result
echo "tpmcTotal: ${tpmcTotal}" >> ${benchmarkDir}/results/result.result
echo "startTime: ${startTime}" >> ${benchmarkDir}/results/result.result
echo "endTime: ${endTime}" >> ${benchmarkDir}/results/result.result
echo "transaction counts: ${tcounts}">> ${benchmarkDir}/results/result.result
echo "" >> ${benchmarkDir}/results/result.result
b=''
i=0
while [ $i -le 100 ]
do
printf "[%-50s] %d%% \r" "$b" "$i";
sleep 0.02
((i=i+2))
b+='#'
done
echo
echo "Finished redirecting the above sort out results to ${benchmarkDir}/results/result.result"
echo ""
echo "#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"
echo "Please check result output,Or manual view ${benchmarkDir}/results/result.result"
echo ""
cat ${benchmarkDir}/results/result.result
}
### Scp Nmon log to benchmark dir
function scpNmonLog(){
echo "#--------------------------------------------------------------------------------"
echo " Nmon Log Start Scp From All TiDB Nodes To Benchmark Node Dir "
echo "#--------------------------------------------------------------------------------"
if [ ! -d "${benchmarkDir}/nmonLog" ]; then
mkdir -p ${benchmarkDir}/nmonLog/
fi
if [ ! -d "${benchmarkDir}/nmonLog/${date}" ]; then
mkdir -p ${benchmarkDir}/nmonLog/${date}
fi
for ip in ${hostArry[*]}
do
scp -r ${ip}:${nmonDir}/nmonLog/${date}/* ${benchmarkDir}/nmonLog/${date}/
done
}
### Stop benchmark Test and Nmon
function stopBenchmarkTest(){
# kill Benchmarksql processes in the benchmarksql node
echo "#--------------------------------------------------------------------------------"
echo " Start Kill runBenchmark Processes "
echo "#--------------------------------------------------------------------------------"
echo ""
ps -ef | grep runBenchmark.sh | grep -v grep | awk '{print $2}' | xargs sudo kill -9
ps -ef | grep runBenchmark.sh | grep -v grep
if [ $? -eq 0 ]; then
echo "Localhost processes runBechmark.sh has killed Failed,Please check and manual kill."
else
echo "Localhost processes runBechmark.sh has killed Success."
fi
# batch kill nmon processes in all tidb nodes
echo "#--------------------------------------------------------------------------------"
echo " Start Kill Nmon Processes "
echo "#--------------------------------------------------------------------------------"
echo ""
for ip in ${hostArry[*]}
do
ssh ${ip} "ps -ef | grep nmon | grep -v grep | awk '{print $2}' | xargs sudo kill -9"
result=`ssh ${ip} "ps -ef | grep nmon | grep -v grep"`
if [ ! "$result" = "" ]; then
echo "TiDB node ${ip} processes Nmon has killed Failed,Please check and manual kill."
else
echo "TiDB node ${ip} processes Nmon has killed Success."
fi
done
# kill benchmarkSQL processes in the benchamrsql node
echo "#--------------------------------------------------------------------------------"
echo " Start Kill benchmarkSQL Processes "
echo "#--------------------------------------------------------------------------------"
echo ""
ps -ef | grep benchmarkSQL.sh | grep -v grep | awk '{print $2}' | xargs sudo kill -9
ps -ef | grep benchmarkSQL.sh | grep -v grep
if [ $? -eq 0 ]; then
echo "Localhost processes bechmarkSQL.sh has killed Failed,Please check and manual kill."
else
echo "Localhost processes bechmarkSQL.sh has killed Success."
fi
}
### Query Benchmarksql and Nmon status
function statusBenchmarkTest(){
# query nodes benchmarkSQL processes
echo "#--------------------------------------------------------------------------------"
echo " Localhost Node BenchmarkSQL Processes "
echo "#--------------------------------------------------------------------------------"
echo ""
ps -ef | grep runBenchmark.sh | grep -v grep
if [ $? -eq 0 ]; then
echo "Localhost processes runBechmark.sh Existed and Running."
else
echo "Localhost processes runBechmark.sh Not existed,Please check and manual query."
fi
# query all nodes nmon processes
echo "#--------------------------------------------------------------------------------"
echo " All TiDB Nodes Nmon Processes "
echo "#--------------------------------------------------------------------------------"
echo ""
for ip in ${hostArry[*]}
do
result=`ssh ${ip} "ps -ef | grep nmon | grep -v grep"`
if [ ! "$result" = "" ]; then
echo ${result}
echo "TiDB node ${ip} processes Nmon Existed and Running."
echo ""
else
if [ ${nmonState} -eq 0 ];then
echo "Because variable nmonState equal 0,Show nmon need not turn on,That State is normal state."
else
echo "TiDB node ${ip} processes Nmon Not existed,Please manually check why nmon not be turned on,Or Manually start all tidb node nmon processed."
fi
fi
done
}
### Final run benchmark Test
function runBenchmarkTest(){
if [ ${nmonState} -eq 1 ]; then
enableNmonRun
startBenchmarkSQLTest
statusBenchmarkTest
# check all tidb node nmon process if existed
declare -a nmonArry
index=0
while true
do
for ip in ${hostArry[*]}
do
result=`ssh ${ip} "ps -ef | grep nmon | grep -v grep"`
if [ "$result" = "" ]; then
nmonArry[index]=${ip}
((index++))
fi
done
nmonarr=($(echo ${nmonArry[*]} | sed 's/ /
/g' | sort | uniq))
num=${#nmonarr[@]}
hostNums=${#hostArry[@]}
if [ ${num} -eq ${hostNums} ]; then
sleep 1
scpNmonLog
break;
else
echo "nmon process existed" > /dev/null 2>&1
fi
done
# check runBenchmark.sh processes if exited
while true
do
benchmarkPID=`ps -ef | grep runBenchmark.sh | grep -v grep | awk '{print $2}'`
if [ "$benchmarkPID" = "" ]; then
sleep 1
getBenchmarkResult
break;
else
echo "runBenchmark.sh process exsits" > /dev/null 2>&1
fi
done
else
startBenchmarkSQLTest
statusBenchmarkTest
# check runBenchmark.sh processes if exited
while true
do
benchmarkPID=`ps -ef | grep runBenchmark.sh | grep -v grep | awk '{print $2}'`
if [ "$benchmarkPID" = "" ]; then
sleep 1
getBenchmarkResult
break;
else
echo "runBenchmark.sh process exsits" > /dev/null 2>&1
fi
done
fi
}
### Main program entry
if [ ! -n "$1" ]; then
echo ""
echo "Parameter can not be Null,Please input parameter start Or stop Or status Or getresult."
echo ""
else
case "$1" in
(start)
runBenchmarkTest
;;
(stop)
stopBenchmarkTest
;;
(getresult)
getBenchmarkResult
;;
(status)
statusBenchmarkTest
;;
esac
fi
五、常见问题
- 在高并发压力下,TiDB、TiKV 的配置都合理,为什么整体性能还是偏低?
出现这种问题,多数情况下可能与使用了 proxy 有关。可以尝试直接对单个 TiDB 加压,将结果加和与使用 proxy 的进行对比。以 haproxy 为例,nbproc 参数可以增加其最大启动的进程数,较新版本的 haproxy 还支持 nbthread 和 cpu-map 等。都可以降低其对性能的不利影响。