tispark 报错 WARN TaskSetManager: Lost task 27.0 in stage 2.0 (TID 249, 10.5.63.103, executor 1): com.pingcap.tikv.exception.TiClientInternalException: Error reading region:

tispark执行sql报错,但是还能正常执行完成,不过速度很慢,比直接在tidb中执行还慢,附报错日志[Stage 2:===> (2 + 16) / 32]21/11/25 11:43:51 WARN TaskSetManager: Lost task 27.0 in stage 2.0 (TID 249, 10.5.63.103, executor 1): com.pingcap.tikv.exception.TiClientInternalException: Error reading region:
at com.pingcap.tikv.operation.iterator.DAGIterator.doReadNextRegionChunks(DAGIterator.java:189)
at com.pingcap.tikv.operation.iterator.DAGIterator.readNextRegionChunks(DAGIterator.java:166)
at com.pingcap.tikv.operation.iterator.DAGIterator.hasNext(DAGIterator.java:112)
at org.apache.spark.sql.tispark.TiRowRDD$$anon$1.hasNext(TiRowRDD.scala:69)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.coprocessorrdd_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.pingcap.tikv.exception.RegionTaskException: Handle region task failed:
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.pingcap.tikv.operation.iterator.DAGIterator.doReadNextRegionChunks(DAGIterator.java:184)
… 22 more
Caused by: com.pingcap.tikv.exception.RegionTaskException: Handle region task failed:
at com.pingcap.tikv.operation.iterator.DAGIterator.process(DAGIterator.java:232)
at com.pingcap.tikv.operation.iterator.DAGIterator.lambda$submitTasks$1(DAGIterator.java:90)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
… 3 more
Caused by: com.pingcap.tikv.exception.GrpcException: shade.io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
at com.pingcap.tikv.policy.RetryPolicy.rethrowNotRecoverableException(RetryPolicy.java:45)
at com.pingcap.tikv.policy.RetryPolicy.callWithRetry(RetryPolicy.java:55)
at com.pingcap.tikv.AbstractGRPCClient.callWithRetry(AbstractGRPCClient.java:77)
at com.pingcap.tikv.region.RegionStoreClient.coprocess(RegionStoreClient.java:663)
at com.pingcap.tikv.operation.iterator.DAGIterator.process(DAGIterator.java:219)
… 7 more
Caused by: shade.io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
at shade.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:244)
at shade.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:225)
at shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:142)
at com.pingcap.tikv.AbstractGRPCClient.lambda$callWithRetry$0(AbstractGRPCClient.java:80)
at com.pingcap.tikv.policy.RetryPolicy.callWithRetry(RetryPolicy.java:53)
… 10 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3520)
at shade.com.google.protobuf.ByteString$ArraysByteArrayCopier.copyFrom(ByteString.java:126)
at shade.com.google.protobuf.ByteString.copyFrom(ByteString.java:362)
at shade.com.google.protobuf.ByteString.copyFrom(ByteString.java:372)
at shade.com.google.protobuf.CodedInputStream$StreamDecoder.readBytesSlowPath(CodedInputStream.java:2978)
at shade.com.google.protobuf.CodedInputStream$StreamDecoder.readBytes(CodedInputStream.java:2386)
at org.tikv.kvproto.Coprocessor$Response.(Coprocessor.java:2158)
at org.tikv.kvproto.Coprocessor$Response.(Coprocessor.java:2107)
at org.tikv.kvproto.Coprocessor$Response$1.parsePartialFrom(Coprocessor.java:3906)
at org.tikv.kvproto.Coprocessor$Response$1.parsePartialFrom(Coprocessor.java:3901)
at shade.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:86)
at shade.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48)
at shade.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller.parseFrom(ProtoLiteUtils.java:223)
at shade.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller.parse(ProtoLiteUtils.java:215)
at shade.io.grpc.protobuf.lite.ProtoLiteUtils$MessageMarshaller.parse(ProtoLiteUtils.java:118)
at shade.io.grpc.MethodDescriptor.parseResponse(MethodDescriptor.java:273)
at shade.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInternal(ClientCallImpl.java:658)
at shade.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:643)
at shade.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at shade.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at shade.io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:694)
at shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:135)
at com.pingcap.tikv.AbstractGRPCClient.lambda$callWithRetry$0(AbstractGRPCClient.java:80)
at com.pingcap.tikv.AbstractGRPCClient$$Lambda$59/1545725695.call(Unknown Source)
at com.pingcap.tikv.policy.RetryPolicy.callWithRetry(RetryPolicy.java:53)
at com.pingcap.tikv.AbstractGRPCClient.callWithRetry(AbstractGRPCClient.java:77)
at com.pingcap.tikv.region.RegionStoreClient.coprocess(RegionStoreClient.java:663)
at com.pingcap.tikv.operation.iterator.DAGIterator.process(DAGIterator.java:219)
at com.pingcap.tikv.operation.iterator.DAGIterator.lambda$submitTasks$1(DAGIterator.java:90)
at com.pingcap.tikv.operation.iterator.DAGIterator$$Lambda$26/2005495955.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

±-----------±-----------±-----------±--------------±----------------±------------------±--------±-----------±-------±----------+
|l_returnflag|l_linestatus| sum_qty| sum_base_price| sum_disc_price| sum_charge| avg_qty| avg_price|avg_disc|count_order|
±-----------±-----------±-----------±--------------±----------------±------------------±--------±-----------±-------±----------+
| A| F| 75478173.00|113197331346.02|107536408207.3092|111838898769.617614|25.505699|38251.814164|0.050004| 2959267|
| N| F| 1966480.00| 2946114826.74| 2798796636.1564| 2911030163.068588|25.530081|38248.316500|0.049996| 77026|
| N| O|148642120.00|222903562619.26|211762318145.5768|220235782971.622552|25.495192|38232.562546|0.049981| 5830202|
| R| F| 75577628.00|113351914218.17|107688081811.4887|111994307866.220439|25.512150|38263.321544|0.049980| 2962417|
±-----------±-----------±-----------±--------------±----------------±------------------±--------±-----------±-------±----------+

Caused by: java.lang.OutOfMemoryError: Java heap space

这都直接溢出了… 等于是失败了

可是结果还是正常显示了呀

spark 是 actor 模型
其中一个 task 失败,不会影响其他的 task (所以你可以看到有结果)

这段错误表示这段数据就没读完
shade.io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.