用SPARK 3:远程FileChangedException在LocalStack上写入S3 - S3在打开位置时报告的更改。 ETAG不可用

发布于 2025-02-07 05:51:00 字数 575 浏览 2 评论 0原文

我试图将Parquet写入我的TestContainers LocalStack中,并获得此错误:

org.apache.hadoop.fs.s3a.RemoteFileChangedException: open `s3a://***.snappy.parquet': Change reported by S3 during open at position ***. ETag *** was unavailable

它正在与Real S3一起使用,并且可以与Spark 2.4和Hadoop 2.7一起使用。

我正在使用:Scala 2.12.15,Spark 3.2.1,Hadoop-aws 3.3.1,testContainers-scala-localstack 0.40.8

代码非常简单,只需将dataframe写入S3位置:

val path = "s3a://***"
import spark.implicits._

val df = Seq(UserRow("1", List("10", "20"))).toDF()
df.write.parquet(path)

I am trying to write parquet into S3 in my testcontainers Localstack and get this error:

org.apache.hadoop.fs.s3a.RemoteFileChangedException: open `s3a://***.snappy.parquet': Change reported by S3 during open at position ***. ETag *** was unavailable

It is working with real S3 and it worked with Spark 2.4 and Hadoop 2.7.

I am using: Scala 2.12.15, Spark 3.2.1, hadoop-aws 3.3.1, testcontainers-scala-localstack 0.40.8

The code is very simple, just write dataframe into s3 location:

val path = "s3a://***"
import spark.implicits._

val df = Seq(UserRow("1", List("10", "20"))).toDF()
df.write.parquet(path)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

甚是思念 2025-02-14 05:51:00

当您创建它时,可以禁用存储桶版本。
这是一个示例:

      //create an S3 client using localstack container
       S3Client s3Client = S3Client.builder ()
        .endpointOverride (localStackContainer.getEndpointOverride (LocalStackContainer.Service.S3))
        .credentialsProvider (StaticCredentialsProvider.create (AwsBasicCredentials
            .create (localStackContainer.getAccessKey (), localStackContainer.getSecretKey ())))
        .region (Region.of (localStackContainer.getRegion ()))
        .build ();
        

    // create desired bucket
    s3Client.createBucket (builder -> builder.bucket (<your-bucket-name>));


    //disable versioning on your bucket
    s3Client.putBucketVersioning (builder -> builder
        .bucket (<your-bucket-name>)
        .versioningConfiguration (builder1 -> builder1
            .status (BucketVersioningStatus.SUSPENDED)));

You could disable bucket versioning at the moment when you create it.
Here is an example:

      //create an S3 client using localstack container
       S3Client s3Client = S3Client.builder ()
        .endpointOverride (localStackContainer.getEndpointOverride (LocalStackContainer.Service.S3))
        .credentialsProvider (StaticCredentialsProvider.create (AwsBasicCredentials
            .create (localStackContainer.getAccessKey (), localStackContainer.getSecretKey ())))
        .region (Region.of (localStackContainer.getRegion ()))
        .build ();
        

    // create desired bucket
    s3Client.createBucket (builder -> builder.bucket (<your-bucket-name>));


    //disable versioning on your bucket
    s3Client.putBucketVersioning (builder -> builder
        .bucket (<your-bucket-name>)
        .versioningConfiguration (builder1 -> builder1
            .status (BucketVersioningStatus.SUSPENDED)));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文