我试图使用Spark和Writing作为Delta Lake(GCS中的路径)从GCS存储桶中读取CSV文件,但无法做写操作

发布于 2025-02-12 23:15:49 字数 3255 浏览 0 评论 0原文

我在Intellij中尝试,并在pom.xml文件中添加了依赖关系。

        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.12</artifactId>
            <version>1.2.1</version>
        </dependency>

以下是使用的代码:

val df_gcs = spark.read.format("csv").csv(sourcepath )

df_gcs.write.format("delta").save(save_path)

在以下错误:

线程中的异常“ main” java.lang.noclassdeffounderror:org/apache/spark/sql/execution/command/leafrunnablecommand at Java.base/java.lang.classloader.defineclass1(本机方法) at Java.base/java.lang.classloader.defineclass(classLoader.java:1017) 在java.base/java.security.secureclassloader.defineclass(SecureClassLoader.java:174) 在java.base/jdk.internal.loader.builtinclassloader.defineclass(buildinclassloader.java:800) 在java.base/jdk.internal.loader.builtinclassloader.findclassonclasspathornull(buildinclassloader.java:698) at Java.base/jdk.internal.loader.builtinclassloader.loadclassornull(buildinclassloader.java:621) at Java.base/jdk.internal.loader.builtinclassloader.loadclass(buildinclassloader.java:579) at Java.base/jdk.internal.loader.classloaders $ appclassloader.loadclass(classloaders.java:178) at Java.base/java.lang.classloader.loadclass(classLoader.java:522) 在org.apache.spark.sql.delta.sources.deltadatasource.咖啡(DeltadataTasource.Scala:150) atrg.apache.spark.sql.execution.datasources.saveintodatasourcecommand.run(saveintodataSourcecommand.scala:46) atrg.apache.spark.sql.execution.command.command.execudedCommandExec.SideeffectResult $ lzyCompute(commands.sscala:70) at org.apache.spark.sql.execution.command.execudedCommandExec.sideeffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.execudedCommandExec.doexecute(commands.sscala:90) atorg.apache.spark.sql.execution.sparkplan。$ anonfun $ execute $ 1(SparkPlan.Scala:180) 请访问org.apache.spark.sql.execution.sparkplan。$ anonfun $ executeQuery $ 1(SparkPlan.Scala:218) atrg.apache.spark.rdd.rddoperationscope $ .withScope(rddoperationscope.scala:151) atrg.apache.spark.sql.execution.sparkplan.executequery(SparkPlan.Scala:215) atrg.apache.spark.sql.execution.sparkplan.execute(SparkPlan.Scala:176) 请访问org.apache.spark.sql.execution.queryexecution.tordd $ lzyCompute(queryexecution.scala:132) atrg.apache.spark.sql.execution.queryexecution.tordd(queryexecution.scala:131) 请访问org.apache.spark.sql.dataframewriter。 请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 5(sqlexecution.scala:103) atrg.apache.spark.sql.execution.sqlexecution $ .withsqlConfPropaged(sqlexecution.scala:163) 请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 1(sqlexecution.scala:90) atorg.apache.spark.sql.sparksession.vithactive(SparkSession.Scala:775) atrg.apache.spark.sql.execution.sqlexecution $ .WithNewExeCutionId(sqlexecution.scala:64) atrg.apache.spark.sql.dataframewriter.runco​​mmand(dataframewriter.scala:989) atrg.apache.spark.sql.dataframewriter.savetov1source(dataframewriter.scala:438) atrg.apache.spark.sql.dataframewriter.saveinternal(dataframeWriter.scala:409) 请访问org.apache.spark.sql.dataframewriter.save(dataFrameWriter.scala:293)

Im trying in IntelliJ and have added the dependency in pom.xml file.

        <dependency>
            <groupId>io.delta</groupId>
            <artifactId>delta-core_2.12</artifactId>
            <version>1.2.1</version>
        </dependency>

Below are the code used:

val df_gcs = spark.read.format("csv").csv(sourcepath )

df_gcs.write.format("delta").save(save_path)

Getting Below error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/command/LeafRunnableCommand
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:150)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:409)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

话少情深 2025-02-19 23:15:49

将版本更改为1.0.0,并解决了问题。

    <dependency>
        <groupId>io.delta</groupId>
        <artifactId>delta-core_2.12</artifactId>
        <version>1.0.0</version>
    </dependency>

Changed the version as 1.0.0 and it fixed the issue.

    <dependency>
        <groupId>io.delta</groupId>
        <artifactId>delta-core_2.12</artifactId>
        <version>1.0.0</version>
    </dependency>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文