用SPARK 3:远程FileChangedException在LocalStack上写入S3 - S3在打开位置时报告的更改。 ETAG不可用
我试图将Parquet写入我的TestContainers LocalStack中,并获得此错误:
org.apache.hadoop.fs.s3a.RemoteFileChangedException: open `s3a://***.snappy.parquet': Change reported by S3 during open at position ***. ETag *** was unavailable
它正在与Real S3一起使用,并且可以与Spark 2.4和Hadoop 2.7一起使用。
我正在使用:Scala 2.12.15,Spark 3.2.1,Hadoop-aws 3.3.1,testContainers-scala-localstack 0.40.8
代码非常简单,只需将dataframe写入S3位置:
val path = "s3a://***"
import spark.implicits._
val df = Seq(UserRow("1", List("10", "20"))).toDF()
df.write.parquet(path)
I am trying to write parquet into S3 in my testcontainers Localstack and get this error:
org.apache.hadoop.fs.s3a.RemoteFileChangedException: open `s3a://***.snappy.parquet': Change reported by S3 during open at position ***. ETag *** was unavailable
It is working with real S3 and it worked with Spark 2.4 and Hadoop 2.7.
I am using: Scala 2.12.15, Spark 3.2.1, hadoop-aws 3.3.1, testcontainers-scala-localstack 0.40.8
The code is very simple, just write dataframe into s3 location:
val path = "s3a://***"
import spark.implicits._
val df = Seq(UserRow("1", List("10", "20"))).toDF()
df.write.parquet(path)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当您创建它时,可以禁用存储桶版本。
这是一个示例:
You could disable bucket versioning at the moment when you create it.
Here is an example: