Tispark在分区表上执行count报错

为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:

  • 系统版本 & kernel 版本
    CentOS Linux release 7.6.1810 (Core)
  • TiDB 版本
    5.7.25-TiDB-v3.0.2
  • spark 版本
    2.3.2
  • tispark 版本
    2.1.2(下文有具体信息)
  • tispark使用方式
    按照官方文档使用
    https://pingcap.com/docs-cn/v3.0/reference/tispark/
  • 磁盘型号
    SSD
  • 集群节点分布
    tidb集群:两个tidb,三个tikv,三个pd
    spark集群:一个master,一个slaver
  • 数据量 & region 数量 & 副本数
    100717483行数据;
    1195个regions;
    3个副本;
    是分区表,分区个数1024个
  • 问题描述(我做了什么)
    spark-sql> select count(1) from ods_pms_order_detail;
  • 关键词
    执行上述脚本报错,错误信息如下:(具体见附件)
    此时如果不ctrl+C,此执行窗口会卡主,不会自动结束任务。
    spark-sql> select count(1) from ods_pms_order_detail;
    19/10/31 17:52:30 INFO HiveMetaStore: 0: get_database: default
    19/10/31 17:52:30 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
    19/10/31 17:52:30 INFO HiveMetaStore: 0: get_database: ods_qz
    19/10/31 17:52:30 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: ods_qz
    19/10/31 17:52:31 INFO HiveMetaStore: 0: get_database: ods_qz
    19/10/31 17:52:31 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: ods_qz
    19/10/31 17:52:35 WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting ‘spark.debug.maxToStringFields’ in SparkEnv.conf.
    19/10/31 17:52:36 INFO ContextCleaner: Cleaned accumulator 1
    19/10/31 17:52:38 INFO CodeGenerator: Code generated in 193.303257 ms
    19/10/31 17:52:38 INFO CodeGenerator: Code generated in 17.684594 ms
    19/10/31 17:52:38 INFO ContextCleaner: Cleaned accumulator 2
    19/10/31 17:52:40 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
    19/10/31 17:52:40 INFO DAGScheduler: Registering RDD 4096 (processCmd at CliDriver.java:376)
    19/10/31 17:52:40 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
    19/10/31 17:52:40 INFO DAGScheduler: Final stage: ResultStage 1 (processCmd at CliDriver.java:376)
    19/10/31 17:52:40 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
    19/10/31 17:52:40 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0)
    19/10/31 17:52:40 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[4096] at processCmd at CliDriver.java:376), which has no missing parents
    Exception in thread “dag-scheduler-event-loop” java.lang.StackOverflowError
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:274)
    at org.apache.spark.rdd.UnionPartition.preferredLocations(UnionRDD.scala:49)
    at org.apache.spark.rdd.UnionRDD.getPreferredLocations(UnionRDD.scala:109)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:275)
    at scala.Option.getOrElse(Option.scala:121)

查询版本信息如下:
> select ti_version();
19/10/31 19:41:11 INFO HiveMetaStore: 0: get_database: global_temp
19/10/31 19:41:11 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: global_temp
19/10/31 19:41:12 INFO PDClient: Switched to new leader: [leaderInfo: 10.1.0.9:2379]
19/10/31 19:41:14 INFO HiveMetaStore: 0: get_database: default
19/10/31 19:41:14 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_database: default
19/10/31 19:41:15 INFO CodeGenerator: Code generated in 181.849495 ms
19/10/31 19:41:15 INFO SparkContext: Starting job: processCmd at CliDriver.java:376
19/10/31 19:41:15 INFO DAGScheduler: Got job 0 (processCmd at CliDriver.java:376) with 1 output partitions
19/10/31 19:41:15 INFO DAGScheduler: Final stage: ResultStage 0 (processCmd at CliDriver.java:376)
19/10/31 19:41:15 INFO DAGScheduler: Parents of final stage: List()
19/10/31 19:41:15 INFO DAGScheduler: Missing parents: List()
19/10/31 19:41:15 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriver.java:376), which h
as no missing parents
19/10/31 19:41:15 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 7.6 KB, free 4.1 GB)
19/10/31 19:41:15 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.8 KB, free 4.1 GB)
19/10/31 19:41:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0.0.0.0:37828 (size: 3.8 KB, free: 4.1 GB)
19/10/31 19:41:15 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
19/10/31 19:41:15 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at processCmd at CliDriv
er.java:376) (first 15 tasks are for partitions Vector(0))
19/10/31 19:41:15 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
19/10/31 19:41:15 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 0.0.0.0, executor 0, partition 0, PROCESS_LOCAL, 8
071 bytes)
19/10/31 19:41:15 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 0.0.0.0:45705 (size: 3.8 KB, free: 8.4 GB)
19/10/31 19:41:16 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 721 ms on 0.0.0.0 (executor 0) (1/1)
19/10/31 19:41:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19/10/31 19:41:16 INFO DAGScheduler: ResultStage 0 (processCmd at CliDriver.java:376) finished in 0.852 s
19/10/31 19:41:16 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 0.906137 s
Release Version: 2.1.2
Supported Spark Version: spark-2.3
Git Commit Hash: b465052cb5d5d273a084cb868fcfc5546849fafd
Git Branch: release-2.1.2
UTC Build Time: 2019-07-31 07:30:26
Time taken: 5.178 seconds, Fetched 1 row(s)
19/10/31 19:41:16 INFO SparkSQLCLIDriver: Time taken: 5.178 seconds, Fetched 1 row(s)tispark执行分区表报错.txt (104.9 KB)

1 个赞

你好,这应该是tispark的一个bug,我们会尽快修复,非常抱歉

请问一下这个分区表有多少个分区?

你好,我已经修复了,方便的话请使用这个jar包进行测试,感谢

链接: https://pan.baidu.com/s/1-4r_j9y-c1Uy1keYNZUWaA 提取码: 94w5 复制这段内容后打开百度网盘手机App,操作更方便哦

修复的代码见:https://github.com/pingcap/tispark/pull/1179

1 个赞

tidb最大支持1024个分区,我们也创建了1024个分区。

厉害,多谢修复!我们验证下。

请问新的jar包是否能解决这个问题?

已经解决,可以Spark已经可以使用TiDB分区表了

感谢反馈

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。