sysbench 高并发压测 tidb4.0 写入冲突

OS: CentOS7.3
TiDB:4.0

使用sysbench oltp_update_index场景压测,到128并发时,报错写入冲突。
并且压测前已经设置了全局变量:
set global tidb_disable_txn_auto_retry = off;

sysbench 1.1.0 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 128
Report intermediate results every 10 second(s)
Initializing random number generator from current time

Initializing worker threads…

Threads started!

[ 10s ] thds: 128 tps: 771.32 qps: 771.32 (r/w/o: 0.00/771.32/0.00) lat (ms,95%): 176.73 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 128 tps: 649.30 qps: 649.30 (r/w/o: 0.00/649.30/0.00) lat (ms,95%): 308.84 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 128 tps: 680.50 qps: 680.50 (r/w/o: 0.00/680.50/0.00) lat (ms,95%): 186.54 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 128 tps: 556.80 qps: 556.80 (r/w/o: 0.00/556.80/0.00) lat (ms,95%): 179.94 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 128 tps: 708.20 qps: 708.20 (r/w/o: 0.00/708.20/0.00) lat (ms,95%): 170.48 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 128 tps: 750.25 qps: 750.25 (r/w/o: 0.00/750.25/0.00) lat (ms,95%): 183.21 err/s: 0.00 reconn/s: 0.00
[ 70s ] thds: 128 tps: 452.67 qps: 452.67 (r/w/o: 0.00/452.67/0.00) lat (ms,95%): 142.39 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_stmt_execute() returned error 9007 (Write conflict, txnStartTS=417406711560405230, conflictStartTS=417406711560405163, conflictCommitTS=417406712477908998, key={tableID=245, indexID=1, indexValues={30729, 5008, }} primary={tableID=245, indexID=1, indexValues={30729, 5008, }} [try again later]) for query ‘UPDATE sbtest1 SET k=k+1 WHERE id=?’
FATAL: thread_run' function failed: ./oltp_common.lua:465: SQL error, errno = 9007, state = 'HY000': Write conflict, txnStartTS=417406711560405230, conflictStartTS=417406711560405163, conflictCommitTS=417406712477908998, key={tableID=245, indexID=1, indexValues={30729, 5008, }} primary={tableID=245, indexID=1, indexValues={30729, 5008, }} [try again later] FATAL: mysql_stmt_execute() returned error 9007 (Write conflict, txnStartTS=417406712504123404, conflictStartTS=417406712504123402, conflictCommitTS=417406712504123405, key={tableID=245, handle=5008} primary={tableID=245, indexID=1, indexValues={30732, 5008, }} [try again later]) for query 'UPDATE sbtest1 SET k=k+1 WHERE id=?' FATAL:thread_run’ function failed: ./oltp_common.lua:465: SQL error, errno = 9007, state = ‘HY000’: Write conflict, txnStartTS=417406712504123404, conflictStartTS=417406712504123402, conflictCommitTS=417406712504123405, key={tableID=245, handle=5008} primary={tableID=245, indexID=1, indexValues={30732, 5008, }} [try again later]
FATAL: mysql_stmt_execute() returned error 9007 (Write conflict, txnStartTS=417406712595873795, conflictStartTS=417406712595873794, conflictCommitTS=417406712595873796, key={tableID=245, indexID=1, indexValues={30755, 5008, }} primary={tableID=245, indexID=1, indexValues={30755, 5008, }} [try again later]) for query ‘UPDATE sbtest1 SET k=k+1 WHERE id=?’
FATAL: `thread_run’ function failed: ./oltp_common.lua:465: SQL error, errno = 9007, state = ‘HY000’: Write conflict, txnStartTS=417406712595873795, conflictStartTS=417406712595873794, conflictCommitTS=417406712595873796, key={tableID=245, indexID=1, indexValues={30755, 5008, }} primary={tableID=245, indexID=1, indexValues={30755, 5008, }} [try again later]

  1. 检查是否之前测试过,导致数据重复?
  2. 查看over-view监控,tidb,detail-tikv等是否有性能瓶颈

1.oltp_update_index场景,根据自增主键id更新索引字段K,非insert操作,应该和之前测试没有关系吧。并且并发<128的时候,压测不会报错。

2.这个场景一般看哪个监控指标

  1. 如果小于128没有问题,那么主要看集群的性能瓶颈
  2. 可以先查看内存,cpu,io等是否某个服务器有瓶颈;
    可以参考性能调优地图查看 https://book.tidb.io/session3/chapter4/performance-map.html#41-性能调优地图

现在压测并发调小,也会写入冲突报错,并且是同一张表。 这种情况怎么排查

FATAL: mysql_stmt_execute() returned error 9007 (Write conflict, txnStartTS=417429178076365314, conflictStartTS=417429179006976113, conflictCommitTS=417429179006976132, key={tableID=245, handle=5022} primary={tableID=245, indexID=1, indexValues={31901, 5022, }} [try again later]) for query ‘UPDATE sbtest1 SET k=k+1 WHERE id=?’
FATAL: thread_run' function failed: ./oltp_common.lua:465: SQL error, errno = 9007, state = 'HY000': Write conflict, txnStartTS=417429178076365314, conflictStartTS=417429179006976113, conflictCommitTS=417429179006976132, key={tableID=245, handle=5022} primary={tableID=245, indexID=1, indexValues={31901, 5022, }} [try again later] FATAL: mysql_stmt_execute() returned error 9007 (Write conflict, txnStartTS=417429179046297683, conflictStartTS=417429179033190489, conflictCommitTS=417429179832729601, key={tableID=245, indexID=1, indexValues={31924, 5022, }} primary={tableID=245, indexID=1, indexValues={31924, 5022, }} [try again later]) for query 'UPDATE sbtest1 SET k=k+1 WHERE id=?' FATAL:thread_run’ function failed: ./oltp_common.lua:465: SQL error, errno = 9007, state = ‘HY000’: Write conflict, txnStartTS=417429179046297683, conflictStartTS=417429179033190489, conflictCommitTS=417429179832729601, key={tableID=245, indexID=1, indexValues={31924, 5022, }} primary={tableID=245, indexID=1, indexValues={31924, 5022, }} [try again later]
FATAL: mysql_stmt_execute() returned error 9007 (Write conflict, txnStartTS=417429179976908808, conflictStartTS=417429179990016001, conflictCommitTS=0, key=tor query ‘UPDATE sbtest1 SET k=k+1 WHERE id=?’

试试调大参数 tidb_retry_limit

https://docs.pingcap.com/zh/tidb/v4.0/tidb-faq#311-error-8005-hy000--write-conflict-txnstartts-is-stale

请提供完整的 sysbench 执行命令。

sysbench --config-file=tidb.conf oltp_update_index.lua --threads=128 --time=600 run

–tidb.conf

mysql-host={TIDB_HOST}
mysql-port=4000
mysql-user=root
mysql-password=
mysql-db=sbtest
tables=32
table-size=10000000
report-interval=10
db-driver=mysql

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。