为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:
- 【系统版本 & kernel 版本】
CentOS Linux release 7.6.1810 (Core) - 【TiDB 版本】
5.7.25-TiDB-v3.0.2 - 【spark 版本】
2.3.2 - 【tispark 版本】
2.1.2(下文有具体信息) - 【tispark使用方式】
按照官方文档使用
https://pingcap.com/docs-cn/v3.0/reference/tispark/ - 【磁盘型号】
SSD - 【集群节点分布】
tidb集群:两个tidb,三个tikv,三个pd
spark集群:一个master,一个slaver - 【数据量 & region 数量 & 副本数】
100717483行数据;
1195个regions;
3个副本;
是分区表,分区个数1024个 - 【问题描述(我做了什么)】
spark-sql> select count(1) from ods_pms_order_detail; - 【关键词】
执行上述脚本报错,错误信息如下:(具体见附件)
此时如果不ctrl+C,此执行窗口会卡主,不会自动结束任务。
spark-sql> select count(1) from ods_pms_order_detail;
19/10/31 17:52:30 INFO HiveMetaStore: 0: get_database: default
19/10/31 17:52:30 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/10/31 17:52:30 INFO HiveMetaStore: 0: get_database: ods_qz
19/10/31 17:52:30 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: ods_qz
19/10/31 17:52:31 INFO HiveMetaStore: 0: get_database: ods_qz
19/10/31 17:52:31 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: ods_qz
19/10/31 17:52:35 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting ‘spark.debug.maxToStringFields’ in SparkEnv.conf.
19/10/31 17:52:36 INFO ContextCleaner: Cleaned accumulator 1
19/10/31 17:52:38 INFO CodeGenerator: Code generated in 193.303257 ms
19/10/31 17:52:38 INFO CodeGenerator: Code generated in 17.684594 ms
19/10/31 17:52:38 INFO ContextCleaner: Cleaned accumulator 2
19/10/31 17:52:40 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
19/10/31 17:52:40 INFO DAGScheduler: Registering RDD 4096 (processCmd at CliDriver.java:376)
19/10/31 17:52:40 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
19/10/31 17:52:40 INFO DAGScheduler: Final stage: ResultStage 1 (processCmd at CliDriver.java:376)
19/10/31 17:52:40 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
19/10/31 17:52:40 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
19/10/31 17:52:40 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[4096] at processCmd at CliDriver.java:376), which has no missing parents
Exception in thread “dag-scheduler-event-loop” java.lang.StackOverflowError
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
at scala.Option.getOrElse(Option.scala:121)
查询版本信息如下:
> select ti_version();
19/10/31 19:41:11 INFO HiveMetaStore: 0: get_database: global_temp
19/10/31 19:41:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: global_temp
19/10/31 19:41:12 INFO PDClient: Switched to new leader: [leaderInfo: 10.1.0.9:2379]
19/10/31 19:41:14 INFO HiveMetaStore: 0: get_database: default
19/10/31 19:41:14 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/10/31 19:41:15 INFO CodeGenerator: Code generated in 181.849495 ms
19/10/31 19:41:15 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
19/10/31 19:41:15 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
19/10/31 19:41:15 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)
19/10/31 19:41:15 INFO DAGScheduler: Parents of final stage: List()
19/10/31 19:41:15 INFO DAGScheduler: Missing parents: List()
19/10/31 19:41:15 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriver.java:376), which h
as no missing parents
19/10/31 19:41:15 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.6 KB, free 4.1 GB)
19/10/31 19:41:15 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.8 KB, free 4.1 GB)
19/10/31 19:41:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0.0.0.0:37828 (size: 3.8 KB, free: 4.1 GB)
19/10/31 19:41:15 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
19/10/31 19:41:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriv
er.java:376) (first 15 tasks are for partitions Vector(0))
19/10/31 19:41:15 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
19/10/31 19:41:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 0.0.0.0, executor 0, partition 0, PROCESS_LOCAL, 8
071 bytes)
19/10/31 19:41:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0.0.0.0:45705 (size: 3.8 KB, free: 8.4 GB)
19/10/31 19:41:16 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 721 ms on 0.0.0.0 (executor 0) (1/1)
19/10/31 19:41:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19/10/31 19:41:16 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 0.852 s
19/10/31 19:41:16 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 0.906137 s
Release Version: 2.1.2
Supported Spark Version: spark-2.3
Git Commit Hash: b465052cb5d5d273a084cb868fcfc5546849fafd
Git Branch: release-2.1.2
UTC Build Time: 2019-07-31 07:30:26
Time taken: 5.178 seconds, Fetched 1 row(s)
19/10/31 19:41:16 INFO SparkSQLCLIDriver: Time taken: 5.178 seconds, Fetched 1 row(s)tispark执行分区表报错.txt (104.9 KB)