TiSpark执行Insert时报错

TiDB版本:7.1.2
TiSpark版本:[tispark-assembly-3.3_2.12-3.2.2.jar]
Spark版本:3.3.1

执行insert into select,执行了3次,只成功了1次,另外两次报以下错误:

operating ExecuteStatement: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 46.0 failed 4 times, most recent failure: Lost task 1.3 in stage 46.0 (TID 2248) (hadoop-0003 executor 2): org.tikv.common.exception.TiBatchWriteException: Execution exception met.
at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeys(TwoPhaseCommitter.java:308)
at org.tikv.txn.TwoPhaseCommitter.prewriteSecondaryKeys(TwoPhaseCommitter.java:259)
at com.pingcap.tispark.utils.TwoPhaseCommitHepler.$anonfun$prewriteSecondaryKeyByExecutors$1(TwoPhaseCommitHepler.scala:102)
at com.pingcap.tispark.utils.TwoPhaseCommitHepler.$anonfun$prewriteSecondaryKeyByExecutors$1$adapted(TwoPhaseCommitHepler.scala:90)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: org.tikv.common.exception.TiBatchWriteException: > max retry number 3, oldRegion={Region[9855557] ConfVer[7425] Version[419822] Store[2853733] KeyRange[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0005\377\000M\000A\0000\0002\377\0000\0009\000G\0006\377\0004\000J\000\000\000\000\373]:[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0008\377\0007\0002\0003\0009\377\0006\0009\0003\0007\377\0006\000Y\000\000\000\000\373]}, currentRegion={Region[9857771] ConfVer[7425] Version[419823] Store[2853733] KeyRange[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0005\377\000M\000A\0000\0002\377\0000\0009\000G\0006\377\0004\000J\000\000\000\000\373]:[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0005\377\000M\000A\0000\0005\377\000D\000A\000B\0008\377\0005\000Y\000\000\000\000\373]}
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeys(TwoPhaseCommitter.java:285)
… 14 more
Caused by: org.tikv.common.exception.TiBatchWriteException: > max retry number 3, oldRegion={Region[9855557] ConfVer[7425] Version[419822] Store[2853733] KeyRange[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0005\377\000M\000A\0000\0002\377\0000\0009\000G\0006\377\0004\000J\000\000\000\000\373]:[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0008\377\0007\0002\0003\0009\377\0006\0009\0003\0007\377\0006\000Y\000\000\000\000\373]}, currentRegion={Region[9857771] ConfVer[7425] Version[419823] Store[2853733] KeyRange[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0005\377\000M\000A\0000\0002\377\0000\0009\000G\0006\377\0004\000J\000\000\000\000\373]:[t\200\000\000\000\000\020\262\335_i\200\000\000\000\000\000\000\001\001\0009\0001\0001\0001\377\0000\0001\0000\0005\377\000M\000A\0000\0005\377\000D\000A\000B\0008\377\0005\000Y\000\000\000\000\373]}
at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:361)
at org.tikv.txn.TwoPhaseCommitter.retryPrewriteBatch(TwoPhaseCommitter.java:390)
at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:369)
at org.tikv.txn.TwoPhaseCommitter.retryPrewriteBatch(TwoPhaseCommitter.java:390)
at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:369)
at org.tikv.txn.TwoPhaseCommitter.retryPrewriteBatch(TwoPhaseCommitter.java:390)
at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:369)
at org.tikv.txn.TwoPhaseCommitter.retryPrewriteBatch(TwoPhaseCommitter.java:390)
at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:369)
at org.tikv.txn.TwoPhaseCommitter.lambda$doPrewriteSecondaryKeys$0(TwoPhaseCommitter.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
… 3 more

看文档是还没更新,暂时不能支持吧(稍等,稍等,等待版本更新)
image

应该是可以支持的,我执行insert into values语句是可以成功的,我的这个问题跟下面这个类似

我看了这个贴子,日志信息中只有重试的信息,没有为什么重试的信息,我找了找我的TiSpark日志,也没有发现有更多的日志信息,不知道有没有地方能找到更详细的信息

看了下源码,似乎有个INFO级别的信息

23/12/15 14:28:51 INFO TwoPhaseCommitter: prewrite secondary key fail, will backoff and retry
23/12/15 14:28:52 INFO TwoPhaseCommitter: oldRegion={Region[9870443] ConfVer[7635] Version[420123] Store[9507300] KeyRange[t\200\000\000\000\000\020\263\336_i\200\000\000\000\000\000\000\0
01\001\0009\0001\0003\0007\377\0001\0007\0000\0002\377\000M\000A\0003\000R\377\000H\000F\0006\000C\377\0007\000C\000\000\000\000\373]:[t\200\000\000\000\000\020\263\336_i\200\000\000\000\0
00\000\000\001\001\0009\0001\0004\0001\377\0000\0001\0000\0005\377\0000\0008\0008\0007\377\0009\0004\0000\0007\377\0008\000N\000\000\000\000\373]} != currentRegion={Region[9872486] ConfVer
[7647] Version[420124] Store[2853787] KeyRange[t\200\000\000\000\000\020\263\336_i\200\000\000\000\000\000\000\001\001\0009\0001\0003\0007\377\0001\0007\0000\0002\377\000M\000A\0003\000R\3
77\000H\000F\0006\000C\377\0007\000C\000\000\000\000\373]:[t\200\000\000\000\000\020\263\336_i\200\000\000\000\000\000\000\001\001\0009\0001\0004\0001\377\0000\0001\0000\0000\377\000M\000A
\0009\000K\377\000W\0009\000F\000C\377\0004\000B\000\000\000\000\373]}, will re-fetch region info and retry

好像就这点有用的信息了

检查是不是因为表没有主键或者唯一健引起的

表是有主键的

这个报错看起来是分布式事务两阶段提交的过程出现了异常,通常是因为提交的数据有事务冲突发生或者版本检查不通过,内部会重试backoff ,最后报错。

调小spark 任务一次提交的batch size试试看。
如果可以,再适当减小spark 作业的并发

减小并发(spark.default.parallelism)似乎有用,我再多观察几天

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。