为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【TiDB 版本】
4.0
【问题描述】
CDH:6.3.2
zepplin:0.9.0
tispark 2.3.14
tidb:4.0
spark-submit --master yarn --deploy-mode cluster --principal jzyc/hadoop@JOIN.COM --keytab /hadoop/jzyc.keytab --class App --jars hdfs://bigdser1:8020/sparklib/* JZTanalyse-1.0-SNAPSHOT.jar 127.0.0.1 1 1 key
我这里运行是没有问题的。但是我在zeppline里面添加了spark.jars hdfs://bigdser1:8020/sparklib/*
在zeppline里运行spark 用tispark读取数据的时候会报错
java.io.IOException: Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: “bigdser4/10.3.87.26”; destination host is: “bigdser1”:8020;
zeppline中已经添加了spark.yarn.principal 和 spark.yarn.keytab这两个参数
我发现如果我直接spark-submit
spark的yarn日志中
YARN executor launch context:
resources:
resources:
tispark-assembly-2.3.14.jar -> resource { scheme: “hdfs” host: “nameservice1” port: -1 file: “/user/jzyc/.sparkStaging/application_1618988436199_0020/tispark-assembly-2.3.14.jar” } size: 24497987 timestamp: 1619593796714 type: FILE visibility: PRIVATE
app.jar -> resource { scheme: “hdfs” host: “nameservice1” port: -1 file: “/user/jzyc/.sparkStaging/application_1618988436199_0020/JZTanalyse-1.0-SNAPSHOT.jar” } size: 44009 timestamp: 1619593796342 type: FILE visibility: PRIVATE
spark_conf -> resource { scheme: “hdfs” host: “nameservice1” port: -1 file: “/user/jzyc/.sparkStaging/application_1618988436199_0020/spark_conf.zip” } size: 171568 timestamp: 1619593796927 type: ARCHIVE visibility: PRIVATE
但用zeppline调用的时候会多出
spark_conf -> resource { scheme: “hdfs” host: “nameservice1” port: -1 file: “/user/jzyc/.sparkStaging/application_1618988436199_0002/spark_conf.zip” } size: 161768 timestamp: 1619149044240 type: ARCHIVE visibility: PRIVATE
log4j_yarn_cluster.properties -> resource { scheme: “hdfs” host: “nameservice1” port: -1 file: “/user/jzyc/.sparkStaging/application_1618988436199_0002/log4j_yarn_cluster.properties” } size: 1018 timestamp: 1619149044023 type: FILE visibility: PRIVATE
tispark-assembly-2.3.14.jar -> resource { scheme: “hdfs” host: “nameservice1” port: -1 file: “/user/jzyc/.sparkStaging/application_1618988436199_0002/tispark-assembly-2.3.14.jar” } size: 24497987 timestamp: 1619149043707 type: FILE visibility: PRIVATE
多出来的这两句不知道是不是造成kerberos认证失败的原因
我现在不清楚是zeppline还是tispark的原因