我如何使用Java I' ve使用df.write()。parquet但会有以下错误?

发布于 2025-02-09 21:51:05 字数 1726 浏览 1 评论 0原文

package com.evampsaanga.imran.testing;

import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class ReadingCsv {
    public static void main(String[] args) {

    
    SparkConf conf = new SparkConf().setAppName("Text File Data Load").setMaster("local").set("spark.driver.host","localhost").set("spark.testing.memory", "2147480000");       
    SparkSession spark = SparkSession.builder().config(conf).getOrCreate(); 
    Dataset<Row> df = spark.read()
                .format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat")
                .option("sep", ",")
                .option("inferSchema", true)
                .option("header", true)
                .load("E:/CarsData.csv");
    df.write().parquet("test.parquet");

我已经使用了df.write.parquet,但是我遇到了这个错误

Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql 。 atrg.apache.spark.sql.execution.datasources.datasource $ .lookupdatasource(datasource.scala:702) atrg.apache.spark.sql.execution.datasources.datasource $ .lookupdatasourcev2(datasource.scala:728) atrg.apache.spark.sql.dataframewriter.lookupv2provider(dataframewriter.scala:948) 请访问org.apache.spark.sql.dataframewriter.save(dataframewriter.scala:285) atrg.apache.spark.sql.dataframewriter.save(dataFrameWriter.scala:269) atrg.apache.spark.sql.dataframewriter.parquet(dataframewriter.scala:829) 在com.evampsaanga.imran.testing.readingcsv.main(Readscsv.java:23)

package com.evampsaanga.imran.testing;

import org.apache.spark.SparkConf;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class ReadingCsv {
    public static void main(String[] args) {

    
    SparkConf conf = new SparkConf().setAppName("Text File Data Load").setMaster("local").set("spark.driver.host","localhost").set("spark.testing.memory", "2147480000");       
    SparkSession spark = SparkSession.builder().config(conf).getOrCreate(); 
    Dataset<Row> df = spark.read()
                .format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat")
                .option("sep", ",")
                .option("inferSchema", true)
                .option("header", true)
                .load("E:/CarsData.csv");
    df.write().parquet("test.parquet");

I have used df.write.parquet but I am getting this error

Exception in thread "main" org.apache.spark.sql.AnalysisException: Multiple sources found for parquet (org.apache.spark.sql.execution.datasources.v2.parquet.ParquetDataSourceV2, org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:702)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:728)
at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:948)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:269)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:829)
at com.evampsaanga.imran.testing.ReadingCsv.main(ReadingCsv.java:23)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文