TiSpark通过datasource api写入,报错org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED

TiSpark的jar包是tispark-assembly-3.1_2.12-3.3.0.jar
Spark版本3.1.1
通过datasource api写入表,代码如下:

df.write().format("tidb")
        .option("database", database)
        .option("table", table)
        .option("replace", "true")
        .mode("append")
        .save();

对于有些表,写入没有问题,但有的表写入会报错:

org.tikv.common.exception.TiBatchWriteException: Execution exception met.
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeys(TwoPhaseCommitter.java:308)
	at org.tikv.txn.TwoPhaseCommitter.prewriteSecondaryKeys(TwoPhaseCommitter.java:259)
	at com.pingcap.tispark.utils.TwoPhaseCommitHepler.$anonfun$prewriteSecondaryKeyByExecutors$1(TwoPhaseCommitHepler.scala:102)
	at com.pingcap.tispark.utils.TwoPhaseCommitHepler.$anonfun$prewriteSecondaryKeyByExecutors$1$adapted(TwoPhaseCommitHepler.scala:90)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
	at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.concurrent.ExecutionException: org.tikv.common.exception.TiBatchWriteException: prewrite secondary key error
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeys(TwoPhaseCommitter.java:285)
	... 14 more
Caused by: org.tikv.common.exception.TiBatchWriteException: prewrite secondary key error
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeySingleBatchWithRetry(TwoPhaseCommitter.java:426)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:357)
	at org.tikv.txn.TwoPhaseCommitter.retryPrewriteBatch(TwoPhaseCommitter.java:390)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeySingleBatchWithRetry(TwoPhaseCommitter.java:439)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeysInBatchesWithRetry(TwoPhaseCommitter.java:357)
	at org.tikv.txn.TwoPhaseCommitter.lambda$doPrewriteSecondaryKeys$0(TwoPhaseCommitter.java:292)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: org.tikv.common.exception.GrpcException: org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED
	at org.tikv.common.policy.RetryPolicy.rethrowNotRecoverableException(RetryPolicy.java:70)
	at org.tikv.common.policy.RetryPolicy.callWithRetry(RetryPolicy.java:94)
	at org.tikv.common.AbstractGRPCClient.callWithRetry(AbstractGRPCClient.java:88)
	at org.tikv.common.region.RegionStoreClient.prewrite(RegionStoreClient.java:486)
	at org.tikv.common.region.RegionStoreClient.prewrite(RegionStoreClient.java:435)
	at org.tikv.txn.TxnKVClient.prewrite(TxnKVClient.java:104)
	at org.tikv.txn.TwoPhaseCommitter.doPrewriteSecondaryKeySingleBatchWithRetry(TwoPhaseCommitter.java:416)
	... 11 more
Caused by: org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED
	at org.tikv.shade.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:287)
	at org.tikv.shade.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:268)
	at org.tikv.shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:175)
	at org.tikv.common.AbstractGRPCClient.lambda$callWithRetry$0(AbstractGRPCClient.java:91)
	at org.tikv.common.policy.RetryPolicy.callWithRetry(RetryPolicy.java:88)
	... 16 more

请问该如何排查问题

比较下表结构的差异,看看是不是简单结构的表,更容易正常写入

请问表结构是否简单指的是什么,字段数量少?

不是, 日志中出现的错误比较少见
prewrite secondary key error

所以,哪些可以成功?哪些不能成功?你要提供出来,比较下了

我们有多个的任务,都是执行sql并将结果写入各自TiDB表中,
目前来说,各个表的字段个数从20到80个,各不一样
能观察到,报错的表不是始终报错,有可能重试几次就写入成功了
只不过最近出错重试越来越频繁,昨天有2张表甚至始终报错重试也过不去。

另外想问,是不是应该从导致prewrite secondary key error的root cause入手
org.tikv.common.region.RegionStoreClient.prewrite(RegionStoreClient.java:486)

org.tikv.shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:175)
这里出现异常
org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED
会和TiKV server状态或什么指标有关系吗

prewrite secondary key error 是写入就完全失败了

一般写入有两步:
primary key write
secondary key write

来满足分布式事务提交过程的控制和管理,事务失败则会回滚,不会出现 error 的反馈…
看起来不是 region 有问题,就是 tikv 节点有问题了,这个你得自己在查查

2pc实现事务写,secondary key prewrite异常造成写入数据不成功,不影响已有数据,这个没问题。

region或tikv有异常,有什么排查的好办法吗
tikv日志里没看到有UNIMPLEMENTED相关的异常信息。

通过集群的监控服务来看就行了, grafana

很多指标的,除了很复杂的性能问题,简单的问题很容易发现

但是,tikv 没错误日志? 那是不是因为 tispark 对 7.1.X 的兼容性不好导致的?
image

我估计得去 tispark github 上去瞧瞧更新信息才知道了,最近 github都没办法访问…

谢谢,不排除和7.1的兼容性有关。
另外tispark的git很久没有更新了……

晕,那如果你要用 spark,建议还是 6.5.X 的版本为好

反馈一下,报错原因是写入的表建立了TiFlash副本。

TiSpark datasource api写入用到了client-java的org.tikv.txn.TwoPhaseCommitter
其中doPrewriteSecondaryKeysInBatchesWithRetry方法,也就是secondary key prewrite环节会根据写入的key按region进行分组,
分组的数据写入到对应region的每一个store中,
由于配置了TiFlash副本,会出现往TiFlash的store写入,而TiFlash是不支持prewrite的,所以grpc请求报错为org.tikv.shade.io.grpc.StatusRuntimeException: UNIMPLEMENTED

验证如下

解决方案是判断store.isTiFlash()则跳过写入。