Tispark在分区表上执行count报错

为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:

  • 系统版本 & kernel 版本】 CentOS Linux release 7.6.1810 (Core)
  • TiDB 版本】 5.7.25-TiDB-v3.0.2
  • spark 版本】 2.3.2
  • tispark 版本】 2.1.2(下文有具体信息)
  • tispark使用方式】 按照官方文档使用 https://pingcap.com/docs-cn/v3.0/reference/tispark/
  • 磁盘型号】 SSD
  • 集群节点分布】 tidb集群:两个tidb,三个tikv,三个pd spark集群:一个master,一个slaver
  • 数据量 & region 数量 & 副本数】 100717483行数据; 1195个regions; 3个副本; 是分区表,分区个数1024个
  • 问题描述(我做了什么)】 spark-sql> select count(1) from ods_pms_order_detail;
  • 关键词】 执行上述脚本报错,错误信息如下:(具体见附件) 此时如果不ctrl+C,此执行窗口会卡主,不会自动结束任务。 spark-sql> select count(1) from ods_pms_order_detail; 19/10/31 17:52:30 INFO HiveMetaStore: 0: get_database: default 19/10/31 17:52:30 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default 19/10/31 17:52:30 INFO HiveMetaStore: 0: get_database: ods_qz 19/10/31 17:52:30 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: ods_qz 19/10/31 17:52:31 INFO HiveMetaStore: 0: get_database: ods_qz 19/10/31 17:52:31 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: ods_qz 19/10/31 17:52:35 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting ‘spark.debug.maxToStringFields’ in SparkEnv.conf. 19/10/31 17:52:36 INFO ContextCleaner: Cleaned accumulator 1 19/10/31 17:52:38 INFO CodeGenerator: Code generated in 193.303257 ms 19/10/31 17:52:38 INFO CodeGenerator: Code generated in 17.684594 ms 19/10/31 17:52:38 INFO ContextCleaner: Cleaned accumulator 2 19/10/31 17:52:40 INFO SparkContext: Starting job: processCmd at CliDriver.java:376 19/10/31 17:52:40 INFO DAGScheduler: Registering RDD 4096 (processCmd at CliDriver.java:376) 19/10/31 17:52:40 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions 19/10/31 17:52:40 INFO DAGScheduler: Final stage: ResultStage 1 (processCmd at CliDriver.java:376) 19/10/31 17:52:40 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 19/10/31 17:52:40 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0) 19/10/31 17:52:40 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[4096] at processCmd at CliDriver.java:376), which has no missing parents Exception in thread “dag-scheduler-event-loop” java.lang.StackOverflowError at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274) at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49) at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275) at scala.Option.getOrElse(Option.scala:121)

查询版本信息如下: > select ti_version(); 19/10/31 19:41:11 INFO HiveMetaStore: 0: get_database: global_temp 19/10/31 19:41:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: global_temp 19/10/31 19:41:12 INFO PDClient: Switched to new leader: [leaderInfo: 10.1.0.9:2379] 19/10/31 19:41:14 INFO HiveMetaStore: 0: get_database: default 19/10/31 19:41:14 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default 19/10/31 19:41:15 INFO CodeGenerator: Code generated in 181.849495 ms 19/10/31 19:41:15 INFO SparkContext: Starting job: processCmd at CliDriver.java:376 19/10/31 19:41:15 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions 19/10/31 19:41:15 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376) 19/10/31 19:41:15 INFO DAGScheduler: Parents of final stage: List() 19/10/31 19:41:15 INFO DAGScheduler: Missing parents: List() 19/10/31 19:41:15 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriver.java:376), which h as no missing parents 19/10/31 19:41:15 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.6 KB, free 4.1 GB) 19/10/31 19:41:15 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.8 KB, free 4.1 GB) 19/10/31 19:41:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0.0.0.0:37828 (size: 3.8 KB, free: 4.1 GB) 19/10/31 19:41:15 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039 19/10/31 19:41:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriv er.java:376) (first 15 tasks are for partitions Vector(0)) 19/10/31 19:41:15 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 19/10/31 19:41:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 0.0.0.0, executor 0, partition 0, PROCESS_LOCAL, 8 071 bytes) 19/10/31 19:41:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0.0.0.0:45705 (size: 3.8 KB, free: 8.4 GB) 19/10/31 19:41:16 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 721 ms on 0.0.0.0 (executor 0) (1/1) 19/10/31 19:41:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 19/10/31 19:41:16 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 0.852 s 19/10/31 19:41:16 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 0.906137 s Release Version: 2.1.2 Supported Spark Version: spark-2.3 Git Commit Hash: b465052cb5d5d273a084cb868fcfc5546849fafd Git Branch: release-2.1.2 UTC Build Time: 2019-07-31 07:30:26 Time taken: 5.178 seconds, Fetched 1 row(s) 19/10/31 19:41:16 INFO SparkSQLCLIDriver: Time taken: 5.178 seconds, Fetched 1 row(s)tispark执行分区表报错.txt (104.9 KB)

你好,这应该是tispark的一个bug,我们会尽快修复,非常抱歉

请问一下这个分区表有多少个分区?

你好,我已经修复了,方便的话请使用这个jar包进行测试,感谢

链接: https://pan.baidu.com/s/1-4r_j9y-c1Uy1keYNZUWaA 提取码: 94w5 复制这段内容后打开百度网盘手机App,操作更方便哦

修复的代码见:https://github.com/pingcap/tispark/pull/1179

tidb最大支持1024个分区,我们也创建了1024个分区。

厉害,多谢修复!我们验证下。

请问新的jar包是否能解决这个问题?