TiSpark 本地开发问题

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

[TiDB 版本]
TIDB 5.0.1
[问题描述]
本地开发运行时缺少 TiCompositeSessionCatalog 请问这个该如何修改
def main(args: Array[String]): Unit = {
//System.setProperty(“hadoop.home.dir”, “D:\java\hadoop\hadoop-2.7.1\bin\winutils.exe”)
val sparkConf = new SparkConf()
.setIfMissing(“spark.master”, “spark://192.168.0.221:7077”)
.setIfMissing(“spark.app.name”, getClass.getName)
.setIfMissing(“spark.sql.runSQLOnFiles”,“true”)
.setIfMissing(“spark.sql.extensions”, “org.apache.spark.sql.TiExtensions”)
.setIfMissing(“spark.tispark.pd.addresses”, “192.168.0.222:2379,192.168.0.223:2379,192.168.0.224:2379”)
val spark = SparkSession.builder()
.config(sparkConf)
.getOrCreate();
spark.sql(“show databases”).show();
spark.sql(“use test”);
spark.sql(“select count(*) from test”).show();
spark.close();
}

报错:
21/05/05 18:08:18 INFO ReflectionUtil$: tispark class url: file:/D:/java/maven/repository/com/pingcap/tispark/tispark-core-internal/2.4.0/tispark-core-internal-2.4.0.jar
21/05/05 18:08:18 INFO ReflectionUtil$: spark wrapper class url: jar:file:/D:/java/maven/repository/com/pingcap/tispark/tispark-core-internal/2.4.0/tispark-core-internal-2.4.0.jar!/resources/spark-wrapper-spark-2_4/
Exception in thread “main” java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.catalog.TiCompositeSessionCatalog
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at com.pingcap.tispark.utils.ReflectionUtil$.newTiCompositeSessionCatalog(ReflectionUtil.scala:108)
at org.apache.spark.sql.TiContext.tiCatalog$lzycompute(TiContext.scala:49)
at org.apache.spark.sql.TiContext.tiCatalog(TiContext.scala:49)
at org.apache.spark.sql.execution.command.TiCommand.tiCatalog(TiCommand.scala:30)
at org.apache.spark.sql.execution.command.TiShowDatabasesCommand.run(databases.scala:44)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3369)
at org.apache.spark.sql.Dataset.(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:643)
at com.dgh.hzf.test$.main(test.scala:25)
at com.dgh.hzf.test.main(test.scala)

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

1 个赞

请问 spark 和 tispark 是什么版本?

1 个赞

tispark 2.4.0 spark 2.4.7

1 个赞

找了下,感觉在这里,请问相关的包都导入了哪些?https://github.com/pingcap/tispark/blob/d076066aab0f55f703b3f0668f290fc694cfdad1/core/src/main/scala/org/apache/spark/sql/TiContext.scala

1 个赞

image

1 个赞

请问 pom 里面是否依赖这个?

<dependencies>
    <dependency>
      <groupId>com.pingcap.tispark</groupId>
      <artifactId>tispark-assembly</artifactId>
      <version>2.4.0</version>
    </dependency>
</dependencies>
1 个赞

是的 在Tispark集群里 spark-sql 是可以的
spark-sql> select * from test limit 10;
CN204934249U NULL 13998080
CN204957586U NULL 13998081
CN204698062U NULL 13998082
CN204698133U NULL 13998083
CN204727064U NULL 13998084
CN204727065U NULL 13998085
CN104961436A NULL 13998086
CN104961437A NULL 13998087
CN104961438A NULL 13998088
CN104961479A NULL 13998089
Time taken: 1.391 seconds, Fetched 10 row(s)
spark-sql>

1 个赞

能否提供一下 pom 文件?

1 个赞

好的 pom.xml (5.2 KB)

1 个赞

请将依赖修改成如下试试

<dependency>
                    <groupId>com.pingcap.tispark</groupId>
                    <artifactId>tispark-assembly</artifactId>
                    <version>2.4.0</version>
                    <exclusions>
                        <exclusion>
                            <groupId>com.pingcap.tispark</groupId>
                            <artifactId>tispark-core-internal</artifactId>
                        </exclusion>
                    </exclusions>
                </dependency>
2 个赞

您好 依赖修改后是可以执行
spark.sql(“show databases”).show(false)
tidb中的数据库可以显示出来了
不过
spark.sql(“use test”)
spark.sql(
“”"
|select count(*)
|from test.test
“”".stripMargin).show(false)

跑sql代码时 一直停留在了 runnning 状态中
image

1 个赞

select count(*) 在Tispark集群里能执行成功吗?

是的 可以的
spark-sql> select count(*) from test.test;
18851659

1 个赞

检查一下本地机器的网络和 tikv 的网络是否联通?

1 个赞

网络是通的

看下 task 执行状况?有没有报错?总共多少 task?完成了多少?

感谢 可以运行了 因为我在本运行的 造成问题的是本地 spark --driver-url 参数

大佬你好,方便看下这个主题的问题吗?

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。