记录大于AWS胶水中的拆分尺寸?

发布于 2025-01-31 11:01:26 字数 545 浏览 5 评论 0原文

我是AWS胶水和Spark的新手。 我在此构建了我的ETL。 当将我的S3与200MB的文件连接时,大约没有读取此内容。 错误是该

An error was encountered:
An error occurred while calling o99.toDF.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 10.0 failed 1 times, most recent failure: Lost task 1.0 in stage 10.0 (TID 16) (91ec547edca7 executor driver): com.amazonaws.services.glue.util.NonFatalException: Record larger than the Split size: 67108864

更新1: 当将我的json文件(200mb)与

我的解决方案是一个lambda拆分文件,但我想知道AWS GLUE SPLIT如何工作 谢谢,问候

I'm Newbie in AWS Glue and Spark.
I build my ETL in this.
When connect my s3 with files of 200mb approximately not read this.
The error is that

An error was encountered:
An error occurred while calling o99.toDF.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 10.0 failed 1 times, most recent failure: Lost task 1.0 in stage 10.0 (TID 16) (91ec547edca7 executor driver): com.amazonaws.services.glue.util.NonFatalException: Record larger than the Split size: 67108864

Update 1:
When split my json file(200mb) with jq, in two parts AWS GLUE, read with normally both parts

My solution is a lambda splitting file, but i want to know how aws glue split works
Thanks and Regards

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文