如何做 TiDB BenchmarkSQL 测试

作者 | Wentao Jin

一、BenchmarkSQL 介绍

TPC-C Benchmark是一个对OLTP(联机交易处理)系统进行测试的规范,使用一个商品销售模型对OLTP系统进行测试,其中包含五类事务:

  • NewOrder –新订单的生成

  • Payment –订单付款

  • OrderStatus –最近订单查询

  • Delivery –配送

  • StockLevel –库存缺货状态分析

    在测试开始前,TPC-C Benchmark规定了数据库的初始状态,也就是数据库中数据生成的规则,其中ITEM表中固定包含10万种商品,仓库的数量可进行调整,假设WAREHOUSE表中有W条记录,那么:

  • STOCK表中应有W×10万条记录(每个仓库对应10万种商品的库存数据);

  • DISTRICT表中应有W×10条记录(每个仓库为10个地区提供服务);

  • CUSTOMER表中应有W×10×3000条记录(每个地区有3000个客户);

  • HISTORY表中应有W×10×3000条记录(每个客户一条交易历史);

  • ORDER表中应有W×10×3000条记录(每个地区3000个订单),并且最后生成的900个订单被添加到NEW-ORDER表中,每个订单随机生成5~15条ORDER-LINE记录。

每个 Warehouse 的数据量,其大小约为 76823.04KB,计算公式为:数据总量(KB)≈ Warehouse个数*76823.04KB

TPC-C使用tpmC值(Transactions per Minute)来衡量系统最大有效吞吐量(MQTh,Max Qualified Throughput),其中Transactions以NewOrder Transaction为准,即最终衡量单位为每分钟处理的新订单数。

二、TiDB 部署

2.1、官方部署建议

参考 TiDB 部署文档 部署 TiDB 集群。在 3 台服务器的条件下,建议每台机器部署 1 个 TiDB, 1 个 PD, 1 个 TiKV 实例。关于磁盘,以 32 张表、每张表 10M 行数据为例,建议 TiKV 的数据目录所在的磁盘空间大于 512 GB。

对于单个 TiDB 的并发连接数,建议控制在 500 以内,如需增加整个系统的并发压力,可以增加 TiDB 实例,具体增加的 TiDB 个数视测试压力而定。

对于写密集型测试,比如 Sysbench 的 oltp_update_index、oltp_update_non_index、oltp_write_only 或者 TPC-C 测试,建议增加 TiKV 实例数量,如每台机器 2 个 TiKV。

对于机器硬件要求,请参考文档,表格内为写密集测试的部署建议

TiDB TiKV PD
node1 1 1(2) 1
node2 1 1(2) 1
node3 1 1(2) 1

2.2、珠海工行部署

机器硬件配置: 3台机器

CPU 2*14*2

内存 256G

存储 800G SSD*4

网卡 万兆网卡

TiDB 集群部署:6 * TiDB、12 * TiKV、1 * PD

TiDB TiKV PD
node1 2 4 1
node2 2 4 0
node3 2 4 0

三、TiDB 集群调优

3.1 TiDB 调优

升高日志级别,可以减少打印日志数量,对性能有积极影响。具体在 TiDB 配置文件中加入:

[log] 
level = "error"

performance:
  # Max CPUs to use, 0 use number of CPUs in the machine. 
  --根据机器的CPU核数设置
  max-procs: 28
  
prepared_plan_cache: 
  --开启 TiDB 配置中的 prepared plan cache,以减少优化执行计划的开销
  enabled: true
  
tikv_client:
  # Max gRPC connections that will be established with each tikv-server. 
  --当tidb、tikv 负载比较低或者CPU空闲时,可适当调大该值,加大负载
  grpc-connection-count: 4
  
txn_local_latches:
  # Enable local latches for transactions. Enable it when
  # there are lots of conflicts between transactions.
  --并发压测建议开启,可减少事务的冲突
  enabled: true

3.2 TiKV 调优

升高 TiKV 的日志级别同样有利于性能表现。由于 TiKV 是以集群形式部署,在 Raft 算法的作用下,能保证大多数节点已经写入数据。因此,除了对数据安全极端敏感的场景之外,raftstore 中的 sync-log 选项可以关闭。

TiKV 集群存在两个 Column Family(Default CF 和 Write CF)主要用于存储不同类型的数据,比例是:

Default CF : Write CF = 4 : 1
或者
Default CF : Write CF = 10: 1

数据多,索引少可采用10:1,数据与索引相差不大可采用4:1

在 TiKV 中需要根据机器内存大小配置 RocksDB 的 block cache,以充分利用内存。以 20 GB 内存的虚拟机部署一个 TiKV 为例,其 block cache 建议配置如下:

global: 
	log-level = "error"
[raftstore]
sync-log = false

[rocksdb.defaultcf]
block-cache-size = "12GB"
[rocksdb.writecf]
block-cache-size = "3GB"

TiKV 其他参数值调整说明值

通过 Grafana 的 TiKV Thread CPU 监控面板可以观察到其实际使用率。

  • 如出现单线程模块瓶颈,可以通过扩展 TiKV 节点来进行负载均摊;
  • 如出现多线程模块瓶颈,可以通过增加该模块并发度进行调整
  -- 根据监控cluster面板中QPS主要用于点查kv_get还是coprocessor 调整对应的storage或者coprocessor
readpool:
  -- 主要用于点查kv get api请求
  storage:
    high-concurrency: 6
    normal-concurrency: 6
    low-concurrency: 6
  
  -- 根据监控coprocessor繁忙程度调整,主要用于查询下推coprocessor tikv
  coprocessor:
    high-concurrency: 8
    normal-concurrency: 8
    low-concurrency: 8
    
server:
 ## Size of the thread pool for the gRPC server. 
 -- 该参数主要用于处理tidb grpc连接调度处理,当 tidb、tikv 比较空闲,在 tidb 调整grpc-connection-count 连接数时,可适当调整该值
  grpc-concurrency: 4
 
 
 storage:
  --调度程序的工作池大小,即写入线程的数量。应该小于总CPU核心数。当有频繁的写操作时,将其设置为更高的价值。更具体地说,你可以运行`top -H -p tikv-pid`来检查是否有线程`sched-worker-pool`很忙
 	scheduler-worker-pool-size: 4
 	

 raftstore:
  --在 Raft 算法的作用下,能保证大多数节点已经写入数据。因此,除了对数据安全极端敏感的场景之外,raftstore 中的 sync-log 选项可以关闭,压测过程一般关闭
  sync-log: false
  
   ## Use how many threads to handle log apply
  -- TiDB 3.0 版本前 raftstore是个单线程,apply log容易出现瓶颈,之前通过增加 TiKV 节点处理,现在可以调整 
  apply-pool-size: 2

  ## Use how many threads to handle raft messages
  -- TiDB 3.0 版本前 raftstore是个单线程,append log容易出现瓶颈,之前通过增加 TiKV 节点处理,现在可以调整  
  store-pool-size: 2

更详细的 TiKV 参数调优请参考TiKV 性能参数调优

四、测试准备工作

4.1、BenchmarkSQL 配置

安装 jdk 至少 1.8以上,下载链接

安装 ant

# yum install ant

因为 BenchmarlSQL 本身是不支持 Mysql ,所以需要更改源代码以支持 Mysql, BenchmarkSQL 官方下载链接

git clone -b 5.0-mysql-support-opt https://github.com/jackysp/benchmarksql.git

编译 benchmarkSQL 源码

$ cd benchmarksql
[benchmarksql]$ ant
Buildfile: /data/tidb/wentaojin/benchmarksql/build.xml

init:
    [mkdir] Created dir: /data/tidb/wentaojin/benchmarksql/build

compile:
    [javac] Compiling 11 source files to /data/tidb/wentaojin/benchmarksql/build

dist:
    [mkdir] Created dir: /data/tidb/wentaojin/benchmarksql/dist
      [jar] Building jar: /data/tidb/wentaojin/benchmarksql/dist/BenchmarkSQL-5.0.jar

BUILD SUCCESSFUL
Total time: 1 second

报错如下,是因为未 ant 编译 benchmarkSQL 源码
$ ./runSQL.sh props.mysql sql.mysql/tableCreates.sql 
# ------------------------------------------------------------
# Loading SQL file sql.mysql/tableCreates.sql
# ------------------------------------------------------------
Error: Could not find or load main class ExecJDBC

benchmarkSQL 参考使用README链接

$ cd run

The test process for MySQL/TiDB
# Create the tables
$ ./runSQL.sh props.mysql sql.mysql/tableCreates.sql

# Add the indices (TiDB prefers to import data after adding the indices)
$ ./runSQL.sh props.mysql sql.mysql/indexCreates.sql

# Import Data
$ ./runLoader.sh props.mysql

参数配置文件 run 目录下 sql.mysql 见如下:

db=mysql
driver=com.mysql.jdbc.Driver
conn=jdbc:mysql://localhost:4000/tpcc?useSSL=false&useServerPrepStmts=true&useConfigs=maxPerformance&sessionVariables=tidb_batch_commit=1
## 一般压测去掉tidb_batch_commit性能更好些,导入加载数据加上tidb_batch_commit导入更快些
#conn=jdbc:mysql://localhost:4000/tpcc?useSSL=false&useServerPrepStmts=true&useConfigs=maxPerformance
user=root
password=

# 指定生成仓库数
warehouses=1
# 导入并发线程数
loadWorkers=4

# 压测并发数
terminals=1
//To run specified transactions per terminal- runMins must equal zero
runTxnsPerTerminal=0
//To run for specified minutes- runTxnsPerTerminal must equal zero
# 压测时间,默认10分钟
runMins=10
//Number of total transactions per minute
# 不限制事务,让其压测最高
limitTxnsPerMin=0

//Set to true to run in 4.x compatible mode. Set to false to use the
//entire configured database evenly.
terminalWarehouseFixed=true

//The following five values must add up to 100
//The default percentages of 45, 43, 4, 4 & 4 match the TPC-C spec
# 默认benchmarkSQL事务压测比例
newOrderWeight=45
paymentWeight=43
orderStatusWeight=4
deliveryWeight=4
stockLevelWeight=4

// Directory name to create for collecting detailed result data.
// Comment this out to suppress.
resultDirectory=my_result_%tY-%tm-%td_%tH%tM%tS
//osCollectorScript=./misc/os_collector_linux.py
//osCollectorInterval=1
//osCollectorSSHAddr=user@dbhost
//osCollectorDevices=net_eth0 blk_sda

4.2、统计信息收集

数据导入完成后,手动收集表统计信息,由于目前 TiDB 缺乏针对于库级别的收集,只能写脚本手动收集所有表统计信息

#!/bin/bash
host=122.71.221.112
port=4000
user=tidb
password='tidb'
dbname=tpch

### Get table name
tabName=`mysql -h ${host} -P ${port} -u${user} -p${password} -s -e "select table_name from information_schema.tables where table_schema='${dbname}'"

### Get table counts
#tabCounts=`mysql -h ${host} -P ${port} -u${user} -p${password} -e "select count(*) from information_schema.tables where table_schema='${dbname}' group by table_schema"`

### Convert table name array
declare -a tabArry
idx=0
for tab in ${tabName}
do
	tabArry[idx]=${tab}
	if [ $? -eq 0 ]; then
		echo "table ${tab} add to tabArry success."
		echo "------------------------------------"
		echo "tabArry ${idx} table name: "
		echo ""
		echo ${tabArry[idx]}
		echo ""
	else
		echo "table ${tab} add to tabArry failed."
		echo ""
	fi
	let idx++
done

### Analyze table
for name in ${tabArry[*]}
do
	mysql -h ${host} -P ${port} -u${user} -p${password} -D ${dbname} -e "analyze table ${dbname}.${name}"
	if [ $? -eq 0 ]; then
		echo "analyze table ${dbname}.${name} success."
	else
		echo "analyze table ${dbname}.${name} failed."
	fi
done

4.3、数据库压测

首次压测,可调整参数配置文件runMins压测时间调低寻找最高临界点,然后针对临界点连续压测三次,每次持续10分钟即可

$ cd run/
# Run the benchmark
$ ./runBenchmark.sh props.mysql

4.4、一致性验证脚本

BenchmarkSQL 压测完之后,运行一致性验证脚本.(运行一次即可)

Check.sql

-- SET search_path TO TPCC;
-- Condition 1: W_YTD = sum(D_YTD)
SELECT * FROM (SELECT w.w_id, w.w_ytd, d.sum_d_ytd
           FROM bmsql_warehouse w,
                (SELECT d_w_id, SUM(d_ytd) sum_d_ytd
                 FROM bmsql_district
                 GROUP BY d_w_id) d
           WHERE w.w_id = d.d_w_id) as x
WHERE w_ytd != sum_d_ytd;

-- Condition 2: D_NEXT_O_ID - 1 = max(O_ID) = max(NO_O_ID)
SELECT * FROM (SELECT d.d_w_id, d.d_id, d.d_next_o_id, o.max_o_id, no.max_no_o_id
           FROM bmsql_district d,
                (SELECT o_w_id, o_d_id, MAX(o_id) max_o_id
                 FROM bmsql_oorder
                 GROUP BY o_w_id, o_d_id) o,
                (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id
                 FROM bmsql_new_order
                 GROUP BY no_w_id, no_d_id) no
           WHERE d.d_w_id = o.o_w_id AND d.d_w_id = no.no_w_id AND
                 d.d_id = o.o_d_id AND d.d_id = no.no_d_id) as x
WHERE d_next_o_id - 1 != max_o_id OR d_next_o_id - 1 != max_no_o_id;

-- Condition 3: max(NO_O_ID) - min(NO_O_ID) + 1
-- = [number of rows in the NEW-ORDER table for this bmsql_district]
SELECT * FROM (SELECT no_w_id, no_d_id, MAX(no_o_id) max_no_o_id,
                  MIN(no_o_id) min_no_o_id, COUNT(*) count_no
           FROM bmsql_new_order
           GROUP BY no_w_id, no_d_Id) as x
WHERE max_no_o_id - min_no_o_id + 1 != count_no;

-- Condition 4: sum(O_OL_CNT)
-- = [number of rows in the ORDER-LINE table for this bmsql_district]
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.sum_o_ol_cnt, ol.count_ol
           FROM (SELECT o_w_id, o_d_id, SUM(o_ol_cnt) sum_o_ol_cnt
                 FROM bmsql_oorder
                 GROUP BY o_w_id, o_d_id) o,
                (SELECT ol_w_id, ol_d_id, COUNT(*) count_ol
                 FROM bmsql_order_line
                 GROUP BY ol_w_id, ol_d_id) ol
           WHERE o.o_w_id = ol.ol_w_id AND
                 o.o_d_id = ol.ol_d_id) as x
WHERE sum_o_ol_cnt != count_ol;

-- Condition 5: For any row in the ORDER table, O_CARRIER_ID is set to a null
-- value if and only if there is a corresponding row in the
-- NEW-ORDER table
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_carrier_id, no.count_no
           FROM bmsql_oorder o,
                (SELECT no_w_id, no_d_id, no_o_id, COUNT(*) count_no
                 FROM bmsql_new_order
                 GROUP BY no_w_id, no_d_id, no_o_id) no
           WHERE o.o_w_id = no.no_w_id AND
                 o.o_d_id = no.no_d_id AND
                 o.o_id = no.no_o_id) as x
WHERE (o_carrier_id IS NULL AND count_no = 0) OR
      (o_carrier_id IS NOT NULL AND count_no != 0);

-- Condition 6: For any row in the ORDER table, O_OL_CNT must equal the number
-- of rows in the ORDER-LINE table for the corresponding order
SELECT * FROM (SELECT o.o_w_id, o.o_d_id, o.o_id, o.o_ol_cnt, ol.count_ol
           FROM bmsql_oorder o,
                (SELECT ol_w_id, ol_d_id, ol_o_id, COUNT(*) count_ol
                 FROM bmsql_order_line
                 GROUP BY ol_w_id, ol_d_id, ol_o_id) ol
           WHERE o.o_w_id = ol.ol_w_id AND
                 o.o_d_id = ol.ol_d_id AND
                 o.o_id = ol.ol_o_id) as x
WHERE o_ol_cnt != count_ol;

-- Condition 7: For any row in the ORDER-LINE table, OL_DELIVERY_D is set to
-- a null date/time if and only if the corresponding row in the
-- ORDER table has O_CARRIER_ID set to a null value
SELECT * FROM (SELECT ol.ol_w_id, ol.ol_d_id, ol.ol_o_id, ol.ol_delivery_d,
                  o.o_carrier_id
           FROM bmsql_order_line ol,
                bmsql_oorder o
           WHERE ol.ol_w_id = o.o_w_id AND
                 ol.ol_d_id = o.o_d_id AND
                 ol.ol_o_id = o.o_id) as x
WHERE (ol_delivery_d IS NULL AND o_carrier_id IS NOT NULL) OR
      (ol_delivery_d IS NOT NULL AND o_carrier_id IS NULL);

-- Condition 8: W_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT w.w_id, w.w_ytd, h.sum_h_amount
      FROM bmsql_warehouse w,
           (SELECT h_w_id, SUM(h_amount) sum_h_amount FROM bmsql_history GROUP BY h_w_id) h
      WHERE w.w_id = h.h_w_id) as x
WHERE w_ytd != sum_h_amount;

-- Condition 9: D_YTD = sum(H_AMOUNT)
SELECT *
FROM (SELECT d.d_w_id, d.d_id, d.d_ytd, h.sum_h_amount
      FROM bmsql_district d,
           (SELECT h_w_id, h_d_id, SUM(h_amount) sum_h_amount
            FROM bmsql_history
            GROUP BY h_w_id, h_d_id) h
      WHERE d.d_w_id = h.h_w_id
        AND d.d_id = h.h_d_id) as x
WHERE d_ytd != sum_h_amount;

4.5、BenchmarkSQL 测试脚本

#!/bin/bash

date=`date +'%Y-%m-%d_%H%M%S'`

### BechmarkSQL run dir,Can not with "/"
benchmarkDir=/data/tidb/wentaojin/benchmarksql/run
### BenchmarkSQL test script
benchmarkScript=/data/tidb/wentaojin/benchmarksql/run/runBenchmark.sh
### BechmarkSQL mysql configuration file  name
benchmarkConfName=props.mysql

### Get benchmark runtime(Mins)
runStr=`grep 'runMins' ${benchmarkDir}/${benchmarkConfName}`
runTime=${runStr##*=}

### BechmarkSQL log
benchmarkLog=${benchmarkDir}/benchmark_${date}.log
if [ ! -f ${benchmarkLog} ]; then
	touch ${benchmarkLog}
fi

###########################################
#	Nmon Variable	                      |
###########################################	
# Whether to activate nmon, 1 show enabled Or 0 show disabeld
nmonState=1
# Confirm all machine nmon binary exist the same dir,Then set nmon program execute dir,Can not with "/"
nmonDir=/data/tidb/wentaojin
# How many seconds to set up to collect once (10s collect once)
# If you want Nmon run 10 mins,you can set 10s collect once and collect 66 times
timeInterval=10
# Set how many times to collect (collect 66 times)
collectTimes=13
### Set need run nmon process host IP in TiDB node
hostArry=(172.16.30.86 172.16.30.87 172.16.30.88 172.16.30.89)



### Enable Nmon processes

function enableNmonRun(){
        echo "#--------------------------------------------------------------------------------"
        echo "                            Start Enabled Nmon                                   "
        echo "#--------------------------------------------------------------------------------"
        for ip in ${hostArry[*]}
        do
                # View remote machine whether existed nmon log dir and  nmon binary whether exist
		if `ssh ${ip} test -d ${nmonDir}`; then
			if `ssh ${ip} test -x ${nmonDir}/nmon`; then
				if `ssh ${ip} test -d  ${nmonDir}/nmonLog/${date}`; then
					ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
				else
					ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
				fi
			else
				ssh ${ip} "chmod +x ${nmonDir}/nmon"
				if `ssh ${ip} test -d  ${nmonDir}/nmonLog/${date}`; then
					ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
				else
					ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
				fi
			fi
                else
			ssh ${ip} "mkdir -p ${nmonDir}"
			if `ssh ${ip} test -x ${nmonDir}/nmon`; then
                        	if `ssh ${ip} test -d  ${nmonDir}/nmonLog/${date}`; then
                               		ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
                       		else
                                	ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
                        	fi
			else
				scp ${nmonDir}/nmon ${ip}:${nmonDir}/
				ssh ${ip} "chmod +x ${nmonDir}/nmon"
				if `ssh ${ip} test -d  ${nmonDir}/nmonLog/${date}`; then
					ssh ${ip} "${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
				else
					ssh ${ip} "mkdir -p ${nmonDir}/nmonLog/${date};${nmonDir}/nmon -s ${timeInterval} -c ${collectTimes} -f -N -m ${nmonDir}/nmonLog/${date} &"
				fi
			fi

                fi
        done
	
	echo ""
	if [ $? -eq 0 ]; then
		echo "Enable nmon success."
	else
		echo "Enable nmon failed."

	fi

}



###	Start benchmarkSQL Test
function startBenchmarkSQLTest(){
	
	nohup ${benchmarkScript} ${benchmarkDir}/${benchmarkConfName} > ${benchmarkLog} 2>&1 &
}

### Get Benchmarksql result
function getBenchmarkResult(){

	echo "#--------------------------------------------------------------------------------"
        echo "                     Start Gather Benchmark Results                              "
        echo "#--------------------------------------------------------------------------------"
	

	### Need to view the parameter  values in the props.mysql configuration file
	resultDir=`ls -ldrt ${benchmarkDir}/my_result_* | tail -1 | awk '{print $9}'`
	
	if [ ${nmonState} -eq 1 ]; then
		nmonDir=`ls -ldrt ${benchmarkDir}/nmonLog/* | tail -1 | awk '{print $9}'`	
	fi
	
	rdir=${resultDir##*/}
	echo ""
	echo "Start display Original dir ${rdir} results."
	echo "#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<" 
	echo ""
	echo "benchmark result name: ${resultDir}"

	echo ""
	echo "benchmark log output name: ${benchmarkLog}"		

	if [ ${nmonState} -eq 1 ]; then
		echo ""
		echo "nmon log dir name: ${nmonDir}"
	fi

	echo ""
	c=`cat ${resultDir}/data/result.csv | awk -F "," '{print $1}' | uniq | tail -1`
	echo "Concurrency: ${c}"
				
	echo ""
	cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$4}END{print "dblatencySum: "sum; print "dblatencyAvg: " sum/NR}'
				
	echo ""
	cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$3}END{print "latencySum: "sum; print "latencyAvg: " sum/NR}'

	echo ""
	nt=`grep 'Measured tpmC (NewOrders)' ${benchmarkLog} | awk '{print $11}'`
	echo "Measured tpmC (NewOrders): ${nt}"		
		
	echo ""
	nl=`grep 'Measured tpmTOTAL ' ${benchmarkLog} | awk '{print $10}'`
	echo "Measured tpmTOTAL: ${nl}"
				
	echo ""
	nst=`grep 'Session Start' ${benchmarkLog} | awk '{print $10" "$11}'`
	echo "tpc-c start run time: ${nst}"
				
	echo ""
	nse=`grep 'Session End' ${benchmarkLog} | awk '{print $10" "$11}'`
	echo "tpc-c end run time: ${nse}"
                
	echo ""
	tc=`grep 'Transaction Count' ${benchmarkLog} | awk '{print $10}'`
	echo "Transaction Count: ${tc}"
	echo "#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"	
		
	echo ""
		

	echo "Start redirecting the above sort out results to ${benchmarkDir}/results/result.result"	
	concurrency=`cat ${resultDir}/data/result.csv | awk -F "," '{print $1}' | uniq | tail -1`
	dblatency=`cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$4}END{print "dblatencyAvg: " sum/NR}'`
	latency=`cat ${resultDir}/data/result.csv | awk -F "," 'BEGIN{sum=0}{sum+=$3}END{print "latencyAvg: " sum/NR}'`
	tpmC=`grep 'Measured tpmC (NewOrders)' ${benchmarkLog} | awk '{print $11}'`
	tpmcTotal=`grep 'Measured tpmTOTAL ' ${benchmarkLog} | awk '{print $10}'`
	startTime=`grep 'Session Start' ${benchmarkLog} | awk '{print $10" "$11}'`
	endTime=`grep 'Session End' ${benchmarkLog} | awk '{print $10" "$11}'`
	tcounts=`grep 'Transaction Count' ${benchmarkLog} | awk '{print $10}'`

	if [ ! -d "${benchmarkDir}/results" ]; then
		mkdir -p ${benchmarkDir}/results
	fi

		
	echo "benchmarkResult dir name: ${resultDir}" >> ${benchmarkDir}/results/result.result
	echo "benchmarkLog name: ${benchmarkLog}" >> ${benchmarkDir}/results/result.result
	
        if [ ${nmonState} -eq 1 ]; then
                echo "nmonLog dir name: ${nmonDir}" >> ${benchmarkDir}/results/result.result
        fi

	echo "concurrency: ${concurrency}"   >> ${benchmarkDir}/results/result.result
	echo "${dblatency}"		     >> ${benchmarkDir}/results/result.result
	echo "${latency}"		     >> ${benchmarkDir}/results/result.result
	echo "tpmC: ${tpmC}"                 >> ${benchmarkDir}/results/result.result
	echo "tpmcTotal: ${tpmcTotal}"       >> ${benchmarkDir}/results/result.result
	echo "startTime: ${startTime}"       >> ${benchmarkDir}/results/result.result
	echo "endTime: ${endTime}"           >> ${benchmarkDir}/results/result.result
	echo "transaction counts: ${tcounts}">> ${benchmarkDir}/results/result.result
	echo ""				     >> ${benchmarkDir}/results/result.result		

	b=''
	i=0
	while [ $i -le 100 ]
	do
		printf "[%-50s] %d%% \r" "$b" "$i";
		sleep 0.02
		((i=i+2))
		b+='#'
	done
	echo
	echo "Finished redirecting the above sort out results to ${benchmarkDir}/results/result.result"

	echo ""
		
	echo "#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<"
	echo "Please check result output,Or manual view ${benchmarkDir}/results/result.result"
	echo ""
	cat ${benchmarkDir}/results/result.result
}

### Scp Nmon log to benchmark dir 
function scpNmonLog(){
	echo "#--------------------------------------------------------------------------------"
	echo "       Nmon Log Start Scp From All TiDB Nodes To Benchmark Node Dir              "
	echo "#--------------------------------------------------------------------------------"

	if [ ! -d "${benchmarkDir}/nmonLog" ]; then
		mkdir -p ${benchmarkDir}/nmonLog/
	fi

	if [ ! -d "${benchmarkDir}/nmonLog/${date}" ]; then
		mkdir -p ${benchmarkDir}/nmonLog/${date}
	fi

	for ip in ${hostArry[*]}
   	do
		scp -r ${ip}:${nmonDir}/nmonLog/${date}/* ${benchmarkDir}/nmonLog/${date}/
	done
}


### Stop benchmark Test and Nmon 
function stopBenchmarkTest(){

	# kill Benchmarksql processes in the benchmarksql node
        echo "#--------------------------------------------------------------------------------"
        echo "                         Start Kill runBenchmark Processes                       "
        echo "#--------------------------------------------------------------------------------"

	echo ""
	ps -ef | grep runBenchmark.sh | grep -v grep | awk '{print $2}' | xargs sudo kill -9
	ps -ef | grep runBenchmark.sh | grep -v grep
	if [ $? -eq 0 ]; then
		echo "Localhost processes runBechmark.sh has killed Failed,Please check and manual kill."
	else
		echo "Localhost processes runBechmark.sh has killed Success."
		
	fi
	
	# batch kill nmon processes in all tidb nodes
        echo "#--------------------------------------------------------------------------------"
        echo "                         Start Kill Nmon Processes                               "
        echo "#--------------------------------------------------------------------------------"

	echo ""
	for ip in ${hostArry[*]}
	do
		ssh ${ip} "ps -ef | grep nmon | grep -v grep | awk '{print $2}' | xargs sudo kill -9"
		result=`ssh ${ip} "ps -ef | grep nmon | grep -v grep"`
		if [ ! "$result" = "" ]; then
			echo "TiDB node ${ip} processes Nmon has killed Failed,Please check and manual kill."
		else
			echo "TiDB node ${ip} processes Nmon has killed Success."
		
		fi
	done

        # kill benchmarkSQL processes in the benchamrsql node
        echo "#--------------------------------------------------------------------------------"
        echo "                         Start Kill benchmarkSQL Processes                       "
        echo "#--------------------------------------------------------------------------------"
        echo ""
        ps -ef | grep benchmarkSQL.sh | grep -v grep | awk '{print $2}' | xargs sudo kill -9
        ps -ef | grep benchmarkSQL.sh | grep -v grep
        if [ $? -eq 0 ]; then
                echo "Localhost processes bechmarkSQL.sh has killed Failed,Please check and manual kill."
        else
                echo "Localhost processes bechmarkSQL.sh has killed Success."

        fi
}


### Query Benchmarksql and Nmon status
function statusBenchmarkTest(){
	# query nodes benchmarkSQL processes
        echo "#--------------------------------------------------------------------------------"
        echo "                  Localhost Node BenchmarkSQL Processes                          "
        echo "#--------------------------------------------------------------------------------"
	echo ""

	ps -ef | grep runBenchmark.sh | grep -v grep
	if [ $? -eq 0 ]; then
		echo "Localhost processes runBechmark.sh Existed and Running."
	else
		echo "Localhost processes runBechmark.sh Not existed,Please check and manual query."
	fi
	
	# query all nodes nmon processes

	echo "#--------------------------------------------------------------------------------"
        echo "                       All TiDB Nodes Nmon Processes                             "
        echo "#--------------------------------------------------------------------------------"
	echo ""
	for ip in ${hostArry[*]}
	do	
		result=`ssh ${ip} "ps -ef | grep nmon | grep -v grep"`
		if [ ! "$result" = "" ]; then
			echo ${result}
			echo "TiDB node ${ip} processes Nmon Existed and Running."
			echo ""
		else
			if [ ${nmonState} -eq 0 ];then
				echo "Because variable nmonState equal 0,Show nmon need not turn on,That State is normal state."
			else
				echo "TiDB node ${ip} processes Nmon Not existed,Please manually check why nmon not be turned on,Or Manually start all tidb node nmon processed."
			fi
		fi
	done
}


### Final run benchmark Test
function runBenchmarkTest(){
	if [ ${nmonState} -eq 1 ]; then
		enableNmonRun
		startBenchmarkSQLTest
		statusBenchmarkTest
		

		# check all tidb node nmon process if existed
		declare -a nmonArry
		index=0
		while true
		do
			for ip in ${hostArry[*]}
			do
				result=`ssh ${ip} "ps -ef | grep nmon | grep -v grep"`
				if [ "$result" = "" ]; then
					nmonArry[index]=${ip}
					((index++))
                        	fi		
        		done
		
			nmonarr=($(echo ${nmonArry[*]} |  sed 's/ /
/g' | sort | uniq))
			num=${#nmonarr[@]}
			hostNums=${#hostArry[@]}
			
                	if [ ${num} -eq ${hostNums} ]; then
				sleep 1
                        	scpNmonLog
				break;
			else
				echo "nmon process existed" > /dev/null 2>&1
			fi
		done

		# check runBenchmark.sh processes if exited
                while true
                do
                        benchmarkPID=`ps -ef | grep runBenchmark.sh | grep -v grep | awk '{print $2}'`
                        if [ "$benchmarkPID" = "" ]; then
                                sleep 1
                                getBenchmarkResult
                                break;
                        else
                                echo "runBenchmark.sh process exsits" > /dev/null 2>&1
                        fi
                done
		
		
	else
		startBenchmarkSQLTest
		statusBenchmarkTest
		
		# check runBenchmark.sh processes if exited
		while true
		do
 			benchmarkPID=`ps -ef | grep runBenchmark.sh | grep -v grep | awk '{print $2}'` 
        		if [ "$benchmarkPID" = "" ]; then
                		sleep 1
                		getBenchmarkResult
				break;
        		else
               			echo "runBenchmark.sh process exsits" > /dev/null 2>&1
        		fi
		done	
	
	fi
}

### Main program entry
if [ ! -n "$1" ]; then
	echo ""
	echo "Parameter can not be Null,Please input parameter start Or stop Or status Or getresult."
	echo ""
else
	case "$1" in
		(start)
			runBenchmarkTest
		;;
		(stop)
			stopBenchmarkTest
		;;

		(getresult)
			getBenchmarkResult
		;;
		(status)
			statusBenchmarkTest
		;;
	esac
fi

五、常见问题

  1. 在高并发压力下,TiDB、TiKV 的配置都合理,为什么整体性能还是偏低?

出现这种问题,多数情况下可能与使用了 proxy 有关。可以尝试直接对单个 TiDB 加压,将结果加和与使用 proxy 的进行对比。以 haproxy 为例,nbproc 参数可以增加其最大启动的进程数,较新版本的 haproxy 还支持 nbthread 和 cpu-map 等。都可以降低其对性能的不利影响。

2 个赞