TiSpark读取Kafka数据写入TiDB偶尔报错java.lang.NullPointerException

为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:

  • 系统版本 & kernel 版本

  • TiDB 版本
    Release Version: v2.1.13
    Git Commit Hash: 6b5b1a6802f9b8f5a22d8aab24ac80729331e1bc
    Git Branch: HEAD
    UTC Build Time: 2019-06-21 12:27:08
    GoVersion: go version go1.12 linux/amd64
    Race Enabled: false
    TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e
    Check Table Before Drop: false

  • 磁盘型号

  • 集群节点分布
    2db、3pd、8kv;
    spark-2.3.1-bin-hadoop2.7 standalone-cluster :2master、10worker;
    jars:tispark-core-2.1-SNAPSHOT-jar-with-dependencies.jar、spark-streaming-kafka-0-10_2.11-2.3.1.jar、mysql-connector-java-5.1.8.jar

  • 数据量 & region 数量 & 副本数

  • 问题描述(我做了什么)
    spark streaming读取Kafka数据存入TiDB表,时间间隔为1分钟一批,每批大概读取2万条数据,偶尔报错如下:


    完整报错信息:
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 13840.0 failed 4 times, most recent failure: Lost task 19.3 in stage 13840.0 (TID 1038322, 10.198.107.62, executor 1): java.lang.NullPointerException

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:363)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3273)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2484)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2484)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3254)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3253)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2484)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2698)
at cmcc.gz.mbh.app.ProgramDataSource.writeArr2Tb(ProgramDataSource.scala:206)
at cmcc.gz.mbh.app.ProgramDataSource.writeAllArr2Tb(ProgramDataSource.scala:220)
at cmcc.gz.mbh.app.ProgramDataSourceFromKafka$$anonfun$run$1.apply(ProgramDataSourceFromKafka.scala:65)
at cmcc.gz.mbh.app.ProgramDataSourceFromKafka$$anonfun$run$1.apply(ProgramDataSourceFromKafka.scala:58)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException

  • 关键词

辛苦补充一下日志,当前提供的日志不太全。看起来像 executor上出错了,提供的日志中只包含了 driver 的错误 stack

我们根据日志排查发现报错没涉及到tispark和tidb的源代码,这边定位出了我们自己的源代码中出问题的代码如下,您这边有什么修改或测试的建议吗?:
code-exception

executor部分日志供参考:
[root@iaas121290-Computer 1]# cat stdout |grep “2019-10-25 12:32”
2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118548
2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118552
2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118556
2019-10-25 12:32:14 INFO Executor:54 - Running task 13.0 in stage 1603.0 (TID 118552)
2019-10-25 12:32:14 INFO Executor:54 - Running task 15.0 in stage 1603.0 (TID 118556)
2019-10-25 12:32:14 INFO Executor:54 - Running task 6.0 in stage 1603.0 (TID 118548)
2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118560
2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118564
2019-10-25 12:32:14 INFO Executor:54 - Running task 16.0 in stage 1603.0 (TID 118560)
2019-10-25 12:32:14 INFO Executor:54 - Running task 19.0 in stage 1603.0 (TID 118564)
2019-10-25 12:32:14 INFO MapOutputTrackerWorker:54 - Updating epoch to 315 and clearing cache
2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118568
2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118572
2019-10-25 12:32:14 INFO Executor:54 - Running task 22.0 in stage 1603.0 (TID 118568)
2019-10-25 12:32:14 INFO Executor:54 - Running task 24.0 in stage 1603.0 (TID 118572)
2019-10-25 12:32:14 INFO TorrentBroadcast:54 - Started reading broadcast variable 1279
2019-10-25 12:32:14 INFO MemoryStore:54 - Block broadcast_1279_piece0 stored as bytes in memory (estimated size 16.7 KB, free 5.2 GB)
2019-10-25 12:32:14 INFO TorrentBroadcast:54 - Reading broadcast variable 1279 took 4 ms
2019-10-25 12:32:14 INFO MemoryStore:54 - Block broadcast_1279 stored as values in memory (estimated size 46.6 KB, free 5.2 GB)
2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 6 offsets 184554353 → 184554905
2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 14 offsets 184563853 → 184564405
2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 10 offsets 183330337 → 183330904
2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 26 offsets 184552811 → 184553361
2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 18 offsets 184582918 → 184583470
2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 2 offsets 184543050 → 184543621
2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 22 offsets 183285543 → 183286116
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:15 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2.
2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_6 stored as bytes in memory (estimated size 978.6 KB, free 5.2 GB)
2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_24 stored as bytes in memory (estimated size 956.3 KB, free 5.1 GB)
2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_16 stored as bytes in memory (estimated size 932.1 KB, free 5.1 GB)
2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_22 stored as bytes in memory (estimated size 1037.3 KB, free 5.1 GB)
2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_19 stored as bytes in memory (estimated size 994.2 KB, free 5.1 GB)
2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_13 stored as bytes in memory (estimated size 1046.8 KB, free 5.1 GB)
2019-10-25 12:32:15 INFO CodeGenerator:54 - Code generated in 15.728711 ms
2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.0 in stage 1603.0 (TID 118560)
2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_15 stored as bytes in memory (estimated size 1013.5 KB, free 5.0 GB)
2019-10-25 12:32:15 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118575
2019-10-25 12:32:15 INFO Executor:54 - Running task 16.1 in stage 1603.0 (TID 118575)
2019-10-25 12:32:15 INFO BlockManager:54 - Found block rdd_7813_16 locally
2019-10-25 12:32:15 INFO Executor:54 - Finished task 6.0 in stage 1603.0 (TID 118548). 1693 bytes result sent to driver
2019-10-25 12:32:15 INFO Executor:54 - Finished task 22.0 in stage 1603.0 (TID 118568). 1693 bytes result sent to driver
2019-10-25 12:32:15 INFO Executor:54 - Finished task 19.0 in stage 1603.0 (TID 118564). 1693 bytes result sent to driver
2019-10-25 12:32:15 INFO Executor:54 - Finished task 13.0 in stage 1603.0 (TID 118552). 1693 bytes result sent to driver
2019-10-25 12:32:15 INFO Executor:54 - Finished task 24.0 in stage 1603.0 (TID 118572). 1693 bytes result sent to driver
2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.1 in stage 1603.0 (TID 118575)
2019-10-25 12:32:15 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118576
2019-10-25 12:32:15 INFO Executor:54 - Running task 16.2 in stage 1603.0 (TID 118576)
2019-10-25 12:32:15 INFO Executor:54 - Finished task 15.0 in stage 1603.0 (TID 118556). 1693 bytes result sent to driver
2019-10-25 12:32:15 INFO BlockManager:54 - Found block rdd_7813_16 locally
2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.2 in stage 1603.0 (TID 118576)
2019-10-25 12:32:15 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118577
2019-10-25 12:32:15 INFO Executor:54 - Running task 16.3 in stage 1603.0 (TID 118577)
2019-10-25 12:32:15 INFO BlockManager:54 - Found block rdd_7813_16 locally
2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.3 in stage 1603.0 (TID 118577)

需要您这边看下到底哪里出了 NPE,如果是选中部分出了 NPE,可以考虑看看从 readArrdf 到 readArrDF 之间的那几个转换是不是哪里有点 bug 了。

好的,谢谢!这个问题可能跟TiSpark和TiDB没啥关系,不知道保留这个问题是否会有影响,您那边是否可以关闭此问题或者我这边需要删除?

没关系的可以保留:blush:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。