TiSpark读取Kafka数据写入TiDB偶尔报错java.lang.NullPointerException

为提高效率,提问时请尽量提供详细背景信息,问题描述清晰可优先响应。以下信息点请尽量提供:

  • 系统版本 & kernel 版本

  • TiDB 版本】 Release Version: v2.1.13 Git Commit Hash: 6b5b1a6802f9b8f5a22d8aab24ac80729331e1bc Git Branch: HEAD UTC Build Time: 2019-06-21 12:27:08 GoVersion: go version go1.12 linux/amd64 Race Enabled: false TiKV Min Version: 2.1.0-alpha.1-ff3dd160846b7d1aed9079c389fc188f7f5ea13e Check Table Before Drop: false

  • 磁盘型号

  • 集群节点分布】 2db、3pd、8kv; spark-2.3.1-bin-hadoop2.7 standalone-cluster :2master、10worker; jars:tispark-core-2.1-SNAPSHOT-jar-with-dependencies.jar、spark-streaming-kafka-0-10_2.11-2.3.1.jar、mysql-connector-java-5.1.8.jar

  • 数据量 & region 数量 & 副本数

  • 问题描述(我做了什么)】 spark streaming读取Kafka数据存入TiDB表,时间间隔为1分钟一批,每批大概读取2万条数据,偶尔报错如下:

    完整报错信息: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 13840.0 failed 4 times, most recent failure: Lost task 19.3 in stage 13840.0 (TID 1038322, 10.198.107.62, executor 1): java.lang.NullPointerException

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2074) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:363) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3273) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3254) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3253) at org.apache.spark.sql.Dataset.head(Dataset.scala:2484) at org.apache.spark.sql.Dataset.take(Dataset.scala:2698) at cmcc.gz.mbh.app.ProgramDataSource.writeArr2Tb(ProgramDataSource.scala:206) at cmcc.gz.mbh.app.ProgramDataSource.writeAllArr2Tb(ProgramDataSource.scala:220) at cmcc.gz.mbh.app.ProgramDataSourceFromKafka$$anonfun$run$1.apply(ProgramDataSourceFromKafka.scala:65) at cmcc.gz.mbh.app.ProgramDataSourceFromKafka$$anonfun$run$1.apply(ProgramDataSourceFromKafka.scala:58) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628) at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51) at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException

  • 关键词

辛苦补充一下日志,当前提供的日志不太全。看起来像 executor上出错了,提供的日志中只包含了 driver 的错误 stack

我们根据日志排查发现报错没涉及到tispark和tidb的源代码,这边定位出了我们自己的源代码中出问题的代码如下,您这边有什么修改或测试的建议吗?: code-exception

executor部分日志供参考: [root@iaas121290-Computer 1]# cat stdout |grep “2019-10-25 12:32” 2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118548 2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118552 2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118556 2019-10-25 12:32:14 INFO Executor:54 - Running task 13.0 in stage 1603.0 (TID 118552) 2019-10-25 12:32:14 INFO Executor:54 - Running task 15.0 in stage 1603.0 (TID 118556) 2019-10-25 12:32:14 INFO Executor:54 - Running task 6.0 in stage 1603.0 (TID 118548) 2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118560 2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118564 2019-10-25 12:32:14 INFO Executor:54 - Running task 16.0 in stage 1603.0 (TID 118560) 2019-10-25 12:32:14 INFO Executor:54 - Running task 19.0 in stage 1603.0 (TID 118564) 2019-10-25 12:32:14 INFO MapOutputTrackerWorker:54 - Updating epoch to 315 and clearing cache 2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118568 2019-10-25 12:32:14 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118572 2019-10-25 12:32:14 INFO Executor:54 - Running task 22.0 in stage 1603.0 (TID 118568) 2019-10-25 12:32:14 INFO Executor:54 - Running task 24.0 in stage 1603.0 (TID 118572) 2019-10-25 12:32:14 INFO TorrentBroadcast:54 - Started reading broadcast variable 1279 2019-10-25 12:32:14 INFO MemoryStore:54 - Block broadcast_1279_piece0 stored as bytes in memory (estimated size 16.7 KB, free 5.2 GB) 2019-10-25 12:32:14 INFO TorrentBroadcast:54 - Reading broadcast variable 1279 took 4 ms 2019-10-25 12:32:14 INFO MemoryStore:54 - Block broadcast_1279 stored as values in memory (estimated size 46.6 KB, free 5.2 GB) 2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 6 offsets 184554353 -> 184554905 2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 14 offsets 184563853 -> 184564405 2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 10 offsets 183330337 -> 183330904 2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 26 offsets 184552811 -> 184553361 2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 18 offsets 184582918 -> 184583470 2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 2 offsets 184543050 -> 184543621 2019-10-25 12:32:14 INFO KafkaRDD:54 - Computing topic programinfo, partition 22 offsets 183285543 -> 183286116 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:542 - Marking the coordinator kafka1:39091 (id: 2147483647 rack: null) dead for group spark-executor-group-programinfos-new2 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:14 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:15 INFO AbstractCoordinator:505 - Discovered coordinator kafka1:39091 (id: 2147483647 rack: null) for group spark-executor-group-programinfos-new2. 2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_6 stored as bytes in memory (estimated size 978.6 KB, free 5.2 GB) 2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_24 stored as bytes in memory (estimated size 956.3 KB, free 5.1 GB) 2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_16 stored as bytes in memory (estimated size 932.1 KB, free 5.1 GB) 2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_22 stored as bytes in memory (estimated size 1037.3 KB, free 5.1 GB) 2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_19 stored as bytes in memory (estimated size 994.2 KB, free 5.1 GB) 2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_13 stored as bytes in memory (estimated size 1046.8 KB, free 5.1 GB) 2019-10-25 12:32:15 INFO CodeGenerator:54 - Code generated in 15.728711 ms 2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.0 in stage 1603.0 (TID 118560) 2019-10-25 12:32:15 INFO MemoryStore:54 - Block rdd_7813_15 stored as bytes in memory (estimated size 1013.5 KB, free 5.0 GB) 2019-10-25 12:32:15 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118575 2019-10-25 12:32:15 INFO Executor:54 - Running task 16.1 in stage 1603.0 (TID 118575) 2019-10-25 12:32:15 INFO BlockManager:54 - Found block rdd_7813_16 locally 2019-10-25 12:32:15 INFO Executor:54 - Finished task 6.0 in stage 1603.0 (TID 118548). 1693 bytes result sent to driver 2019-10-25 12:32:15 INFO Executor:54 - Finished task 22.0 in stage 1603.0 (TID 118568). 1693 bytes result sent to driver 2019-10-25 12:32:15 INFO Executor:54 - Finished task 19.0 in stage 1603.0 (TID 118564). 1693 bytes result sent to driver 2019-10-25 12:32:15 INFO Executor:54 - Finished task 13.0 in stage 1603.0 (TID 118552). 1693 bytes result sent to driver 2019-10-25 12:32:15 INFO Executor:54 - Finished task 24.0 in stage 1603.0 (TID 118572). 1693 bytes result sent to driver 2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.1 in stage 1603.0 (TID 118575) 2019-10-25 12:32:15 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118576 2019-10-25 12:32:15 INFO Executor:54 - Running task 16.2 in stage 1603.0 (TID 118576) 2019-10-25 12:32:15 INFO Executor:54 - Finished task 15.0 in stage 1603.0 (TID 118556). 1693 bytes result sent to driver 2019-10-25 12:32:15 INFO BlockManager:54 - Found block rdd_7813_16 locally 2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.2 in stage 1603.0 (TID 118576) 2019-10-25 12:32:15 INFO CoarseGrainedExecutorBackend:54 - Got assigned task 118577 2019-10-25 12:32:15 INFO Executor:54 - Running task 16.3 in stage 1603.0 (TID 118577) 2019-10-25 12:32:15 INFO BlockManager:54 - Found block rdd_7813_16 locally 2019-10-25 12:32:15 ERROR Executor:91 - Exception in task 16.3 in stage 1603.0 (TID 118577)

需要您这边看下到底哪里出了 NPE,如果是选中部分出了 NPE,可以考虑看看从 readArrdf 到 readArrDF 之间的那几个转换是不是哪里有点 bug 了。

好的,谢谢!这个问题可能跟TiSpark和TiDB没啥关系,不知道保留这个问题是否会有影响,您那边是否可以关闭此问题或者我这边需要删除?

没关系的可以保留:blush: