我试图使用Spark和Writing作为Delta Lake(GCS中的路径)从GCS存储桶中读取CSV文件,但无法做写操作
我在Intellij中尝试,并在pom.xml
文件中添加了依赖关系。
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>1.2.1</version>
</dependency>
以下是使用的代码:
val df_gcs = spark.read.format("csv").csv(sourcepath )
df_gcs.write.format("delta").save(save_path)
在以下错误:
线程中的异常“ main” java.lang.noclassdeffounderror:org/apache/spark/sql/execution/command/leafrunnablecommand at Java.base/java.lang.classloader.defineclass1(本机方法) at Java.base/java.lang.classloader.defineclass(classLoader.java:1017) 在java.base/java.security.secureclassloader.defineclass(SecureClassLoader.java:174) 在java.base/jdk.internal.loader.builtinclassloader.defineclass(buildinclassloader.java:800) 在java.base/jdk.internal.loader.builtinclassloader.findclassonclasspathornull(buildinclassloader.java:698) at Java.base/jdk.internal.loader.builtinclassloader.loadclassornull(buildinclassloader.java:621) at Java.base/jdk.internal.loader.builtinclassloader.loadclass(buildinclassloader.java:579) at Java.base/jdk.internal.loader.classloaders $ appclassloader.loadclass(classloaders.java:178) at Java.base/java.lang.classloader.loadclass(classLoader.java:522) 在org.apache.spark.sql.delta.sources.deltadatasource.咖啡(DeltadataTasource.Scala:150) atrg.apache.spark.sql.execution.datasources.saveintodatasourcecommand.run(saveintodataSourcecommand.scala:46) atrg.apache.spark.sql.execution.command.command.execudedCommandExec.SideeffectResult $ lzyCompute(commands.sscala:70) at org.apache.spark.sql.execution.command.execudedCommandExec.sideeffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.execudedCommandExec.doexecute(commands.sscala:90) atorg.apache.spark.sql.execution.sparkplan。$ anonfun $ execute $ 1(SparkPlan.Scala:180) 请访问org.apache.spark.sql.execution.sparkplan。$ anonfun $ executeQuery $ 1(SparkPlan.Scala:218) atrg.apache.spark.rdd.rddoperationscope $ .withScope(rddoperationscope.scala:151) atrg.apache.spark.sql.execution.sparkplan.executequery(SparkPlan.Scala:215) atrg.apache.spark.sql.execution.sparkplan.execute(SparkPlan.Scala:176) 请访问org.apache.spark.sql.execution.queryexecution.tordd $ lzyCompute(queryexecution.scala:132) atrg.apache.spark.sql.execution.queryexecution.tordd(queryexecution.scala:131) 请访问org.apache.spark.sql.dataframewriter。 请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 5(sqlexecution.scala:103) atrg.apache.spark.sql.execution.sqlexecution $ .withsqlConfPropaged(sqlexecution.scala:163) 请访问org.apache.spark.sql.execution.sqlexecution $。$ anonfun $ withnewexecutionId $ 1(sqlexecution.scala:90) atorg.apache.spark.sql.sparksession.vithactive(SparkSession.Scala:775) atrg.apache.spark.sql.execution.sqlexecution $ .WithNewExeCutionId(sqlexecution.scala:64) atrg.apache.spark.sql.dataframewriter.runcommand(dataframewriter.scala:989) atrg.apache.spark.sql.dataframewriter.savetov1source(dataframewriter.scala:438) atrg.apache.spark.sql.dataframewriter.saveinternal(dataframeWriter.scala:409) 请访问org.apache.spark.sql.dataframewriter.save(dataFrameWriter.scala:293)
Im trying in IntelliJ and have added the dependency in pom.xml
file.
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>1.2.1</version>
</dependency>
Below are the code used:
val df_gcs = spark.read.format("csv").csv(sourcepath )
df_gcs.write.format("delta").save(save_path)
Getting Below error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/command/LeafRunnableCommand
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:150)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:409)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将版本更改为1.0.0,并解决了问题。
Changed the version as 1.0.0 and it fixed the issue.