dataproc; Spark Job在DataProc Spark群集上失败,但在本地运行
我有一个通过Maven项目生成的JAR文件,当我通过Java -jar Jarfilename.jar在本地运行时运行良好。但是,当我尝试在DataProc上运行相同的JAR文件时,我会收到以下错误:
22/06/27 13:13:45 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/06/27 13:13:49 INFO org.sparkproject.jetty.util.log: Logging initialized @7373ms to org.sparkproject.jetty.util.log.Slf4jLog
22/06/27 13:13:51 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile$PercentileDigest.getPercentiles([D)Lscala/collection/Seq;
at com.amazon.deequ.analyzers.ApproxQuantile.fromAggregationResult(ApproxQuantile.scala:84)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:192)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult$(Analyzer.scala:185)
at com.amazon.deequ.analyzers.ApproxQuantile.metricFromAggregationResult(ApproxQuantile.scala:50)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.successOrFailureMetricFrom(AnalysisRunner.scala:362)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$5(AnalysisRunner.scala:330)
at scala.collection.immutable.List.map(List.scala:297)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:328)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167)
at com.amazon.deequ.VerificationSuite.doVerificationRun(VerificationSuite.scala:121)
at com.amazon.deequ.VerificationRunBuilder.run(VerificationRunBuilder.scala:173)
at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1(GCTestOne.scala:42)
at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1$adapted(GCTestOne.scala:11)
at com.amazon.deequ.examples.ExampleUtils$.withSpark(ExampleUtils.scala:32)
at com.amazon.deequ.thesis.GCTestOne$.main(GCTestOne.scala:11)
at com.amazon.deequ.thesis.GCTestOne.main(GCTestOne.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
我不明白为什么DataProc在本地运行良好时具有Nosuchmethoderror。
有人知道为什么这是吗?
I have a JAR file generated via a Maven project that works fine when I run it locally via java -jar JARFILENAME.jar. However, when I try to run the same JAR file on Dataproc I get the following error:
22/06/27 13:13:45 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/06/27 13:13:49 INFO org.sparkproject.jetty.util.log: Logging initialized @7373ms to org.sparkproject.jetty.util.log.Slf4jLog
22/06/27 13:13:51 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile$PercentileDigest.getPercentiles([D)Lscala/collection/Seq;
at com.amazon.deequ.analyzers.ApproxQuantile.fromAggregationResult(ApproxQuantile.scala:84)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:192)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult$(Analyzer.scala:185)
at com.amazon.deequ.analyzers.ApproxQuantile.metricFromAggregationResult(ApproxQuantile.scala:50)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.successOrFailureMetricFrom(AnalysisRunner.scala:362)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$5(AnalysisRunner.scala:330)
at scala.collection.immutable.List.map(List.scala:297)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:328)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167)
at com.amazon.deequ.VerificationSuite.doVerificationRun(VerificationSuite.scala:121)
at com.amazon.deequ.VerificationRunBuilder.run(VerificationRunBuilder.scala:173)
at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1(GCTestOne.scala:42)
at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1$adapted(GCTestOne.scala:11)
at com.amazon.deequ.examples.ExampleUtils$.withSpark(ExampleUtils.scala:32)
at com.amazon.deequ.thesis.GCTestOne$.main(GCTestOne.scala:11)
at com.amazon.deequ.thesis.GCTestOne.main(GCTestOne.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I quite don't get why Dataproc has a NoSuchMethodError when everything runs fine locally.
Someone knows why this is?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
版本与GCP不匹配。我有Spark 3.2.1,但是簇在3.1上运行。
Version mismatch with GCP. I had Spark 3.2.1, but the clusters run on 3.1.