GrpcException: Txn commit primary key failed

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

[TiDB 版本]
v5.0.1

[问题描述]
TiSpark在写入大表时,会出现GrpcException: Txn commit primary key failed,写入小表则不会。
注:写入数据表的数据量大概在80,000,000行/19列

[复现步骤]
df.write
.format(“tidb”)
.mode(SaveMode.Append)
.option(“table”,tableName)
.save()

[错误日志]

21/06/01 17:05:59.825 CST ttl-manager-pool-1 WARN KVErrorHandler: Unable to handle KeyExceptions other than LockException
com.pingcap.tikv.exception.KeyException: unexpected key error meets and it is txn_not_found {
  start_ts: 425338380747538433
  primary_key: "t\200\000\000\000\000\000\002\375_r\200\000\000\000\000\000\000\001"
}

	at com.pingcap.tikv.txn.AbstractLockResolverClient.extractLockFromKeyErr(AbstractLockResolverClient.java:68)
	at com.pingcap.tikv.operation.KVErrorHandler.handleResponseError(KVErrorHandler.java:287)
	at com.pingcap.tikv.policy.RetryPolicy.callWithRetry(RetryPolicy.java:65)
	at com.pingcap.tikv.AbstractGRPCClient.callWithRetry(AbstractGRPCClient.java:77)
	at com.pingcap.tikv.region.RegionStoreClient.txnHeartBeat(RegionStoreClient.java:539)
	at com.pingcap.tikv.txn.TxnKVClient.txnHeartBeat(TxnKVClient.java:124)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:124)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.sendTxnHeartBeat(TTLManager.java:139)
	at com.pingcap.tikv.TTLManager.doKeepAlive(TTLManager.java:112)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/06/01 17:06:00.123 CST pool-3-thread-3 WARN ConcreteBackOffer: BackOffer.maxSleep 30000ms is exceeded, errors:
59.com.pingcap.tikv.exception.GrpcException: Txn commit primary key failed, regionId=118385
60.com.pingcap.tikv.exception.GrpcException: Txn commit primary key failed, regionId=118385
61.com.pingcap.tikv.exception.GrpcException: Txn commit primary key failed, regionId=118385
21/06/01 17:06:00.124 CST pool-3-thread-3 INFO MapPartitionsRDD: Removing RDD 13267 from persistence list
21/06/01 17:06:00.130 CST pool-3-thread-3 INFO MapPartitionsRDD: Removing RDD 13294 from persistence list
21/06/01 17:06:00.130 CST block-manager-storage-async-thread-pool-6334 INFO BlockManager: Removing RDD 13267
21/06/01 17:06:00.135 CST block-manager-storage-async-thread-pool-6337 INFO BlockManager: Removing RDD 13294
21/06/01 17:06:00.163 CST pool-3-thread-3 ERROR JobManagerActor: Got NonFatal Exception: 
com.pingcap.tikv.exception.GrpcException: retry is exhausted.
	at com.pingcap.tikv.util.ConcreteBackOffer.doBackOffWithMaxSleep(ConcreteBackOffer.java:148)
	at com.pingcap.tikv.util.ConcreteBackOffer.doBackOff(ConcreteBackOffer.java:119)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:217)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:223)
	at com.pingcap.tikv.TwoPhaseCommitter.commitPrimaryKey(TwoPhaseCommitter.java:199)
	at com.pingcap.tispark.write.TiBatchWrite.commitPrimaryKey(TiBatchWrite.scala:432)
	at com.pingcap.tispark.write.TiBatchWrite.commitPrimaryKeyWithRetry(TiBatchWrite.scala:386)
	at com.pingcap.tispark.write.TiBatchWrite.doWrite(TiBatchWrite.scala:319)
	at com.pingcap.tispark.write.TiBatchWrite.com$pingcap$tispark$write$TiBatchWrite$$write(TiBatchWrite.scala:88)
	at com.pingcap.tispark.write.TiBatchWrite$.write(TiBatchWrite.scala:45)
	at com.pingcap.tispark.write.TiDBWriter$.write(TiDBWriter.scala:49)
	at com.pingcap.tispark.TiDBDataSource.createRelation(TiDBDataSource.scala:65)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
	at org.apache.spark.sql.SparkJobSession.writeTable(SparkJobSession.scala:128)
	at com.nascent.quantbi.etl.nodes.OutputDsNode._doExecute(OutputDsNode.scala:151)
	at com.nascent.quantbi.etl.nodes.OutputDsNode.doExecute(OutputDsNode.scala:110)
	at com.nascent.quantbi.etl.nodes.OutputDsNode.executeWithCount(OutputDsNode.scala:71)
	at com.nascent.quantbi.etl.JobExecutor$.$anonfun$runETL$2(JobExecutor.scala:84)
	at com.nascent.quantbi.etl.JobExecutor$.$anonfun$runETL$2$adapted(JobExecutor.scala:79)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at com.nascent.quantbi.etl.JobExecutor$.runETL(JobExecutor.scala:79)
	at com.nascent.quantbi.etl.EtlDataJob$.runSparkJob(EtlDataJob.scala:39)
	at org.apache.spark.sql.SparkSQLJob.runJob(SparkJobSessionFactory.scala:82)
	at org.apache.spark.sql.SparkSQLJob.runJob$(SparkJobSessionFactory.scala:76)
	at com.nascent.quantbi.etl.EtlDataJob$.runJob(EtlDataJob.scala:25)
	at spark.jobserver.JobManagerActor.$anonfun$getJobFuture$2(JobManagerActor.scala:363)
	at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
	at scala.util.Success.$anonfun$map$1(Try.scala:255)
	at scala.util.Success.map(Try.scala:213)
	at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
	at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
	at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.pingcap.tikv.exception.GrpcException: Txn commit primary key failed, regionId=118385
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:221)
	... 113 more
Caused by: com.pingcap.tikv.exception.KeyException: Key exception occurred and the reason is retryable: "Txn(Mvcc(TxnLockNotFound { start_ts: TimeStamp(425338380747538433), commit_ts: TimeStamp(425338687927877633), key: [116, 128, 0, 0, 0, 0, 0, 2, 253, 95, 114, 128, 0,
 0, 0, 0, 0, 0, 1] }))"

	at com.pingcap.tikv.region.RegionStoreClient.handleCommitResponse(RegionStoreClient.java:618)
	at com.pingcap.tikv.region.RegionStoreClient.commit(RegionStoreClient.java:595)
	at com.pingcap.tikv.txn.TxnKVClient.commit(TxnKVClient.java:158)
	at com.pingcap.tikv.TwoPhaseCommitter.doCommitPrimaryKeyWithRetry(TwoPhaseCommitter.java:211)
	... 113 more

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

麻烦上传下这个时间段 tikv-detail 监控,多谢。

好的,我先截了下面这些,看还需要哪些面板的监控。

Cluster


Errors


gRPC


17:05 的报错时间和监控应该差不多可以对上,可以参考文档排查下

https://docs.pingcap.com/zh/tidb/stable/tidb-troubleshooting-map#43-客户端报-server-is-busy-错误