为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
tidb version v5.2.1
【概述】 场景 + 问题概述
要将数据拉去到spark中做复杂处理时, tidb出错
【背景】 做过哪些操作
无
【现象】 业务和数据库现象
无
【问题】 当前遇到的问题
不能拉起非常大的数据到spark中处理
【业务影响】
业务不能继续下去
【 TiDB 版本】
v5.2.1
我当然可以加一些限制条件,不要拉去那么多数据到spark。 但是我的问题是, 如果我获取过多数据到spark,也是应该spark处理时候OOM。 为什么在获取tidb数据时候, tikv会OOM呢?
【附件】 相关日志及监控(https://metricstool.pingcap.com/)
21/10/14 11:22:08 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 4, 202.38.228.229, executor 0): com.pingcap.tikv.exception.TiClientInternalException: Error reading region:
at com.pingcap.tikv.operation.iterator.DAGIterator.doReadNextRegionChunks(DAGIterator.java:189)
at com.pingcap.tikv.operation.iterator.DAGIterator.readNextRegionChunks(DAGIterator.java:166)
at com.pingcap.tikv.operation.iterator.DAGIterator.hasNext(DAGIterator.java:112)
at org.apache.spark.sql.tispark.TiRowRDD$$anon$1.hasNext(TiRowRDD.scala:69)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.coprocessorrdd_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage0.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.pingcap.tikv.exception.RegionTaskException: Handle region task failed:
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.pingcap.tikv.operation.iterator.DAGIterator.doReadNextRegionChunks(DAGIterator.java:184)
... 22 more
Caused by: com.pingcap.tikv.exception.RegionTaskException: Handle region task failed:
at com.pingcap.tikv.operation.iterator.DAGIterator.process(DAGIterator.java:232)
at com.pingcap.tikv.operation.iterator.DAGIterator.lambda$submitTasks$1(DAGIterator.java:90)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: com.pingcap.tikv.exception.GrpcException: shade.io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
at com.pingcap.tikv.policy.RetryPolicy.rethrowNotRecoverableException(RetryPolicy.java:45)
at com.pingcap.tikv.policy.RetryPolicy.callWithRetry(RetryPolicy.java:55)
at com.pingcap.tikv.AbstractGRPCClient.callWithRetry(AbstractGRPCClient.java:77)
at com.pingcap.tikv.region.RegionStoreClient.coprocess(RegionStoreClient.java:663)
at com.pingcap.tikv.operation.iterator.DAGIterator.process(DAGIterator.java:219)
... 7 more
Caused by: shade.io.grpc.StatusRuntimeException: CANCELLED: Failed to read message.
at shade.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:244)
at shade.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:225)
at shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:142)
at com.pingcap.tikv.AbstractGRPCClient.lambda$callWithRetry$0(AbstractGRPCClient.java:80)
at com.pingcap.tikv.policy.RetryPolicy.callWithRetry(RetryPolicy.java:53)
... 10 more
Caused by: java.lang.OutOfMemoryError: Java heap space
若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。