AWS Glue Studio 未创建表

发布于 2025-01-14 14:25:47 字数 716 浏览 3 评论 0原文

因此,我一直在使用 AWS Glue Studio 来创建 ETL 作业。我当前将其设置为目标节点是 S3 存储桶,但我还想在数据目录上创建一个表。但是,一旦我运行 ETL 作业,它就不会标记错误,并且会正确地将输出作为 Parquet 文件保存到我的 S3 存储桶中,但它不会在数据目录中创建表。

这是我的代码:

node hem-horarios-bpi hemhorariosbpi_node3 = glueContext.getSink(
path="s3://hem-data-datalake-staging/staging_general/staging_horarioHP/", connection_type="s3", updateBehavior="LOG", partitionKeys=[], enableUpdateCatalog=True, transformation_ctx="hemhorariosbpi_node3", ) 
hemhorariosbpi_node3.setCatalogInfo( catalogDatabase="hem-db-staging-tables", catalogTableName="hem-horarios-pbi" )
hemhorariosbpi_node3.setFormat("glueparquet") 
hemhorariosbpi_node3.writeFrame(S3bucket_node1) 
job.commit()

我尝试过更改角色,但不行。

So I've been using AWS Glue Studio to create ETL jobs. I currently set it up so that the target node was an S3 bucket, however I also want to create a table on the Data Catalog. However, once I run the ETL job it doesn't mark an error and it does correctly save the output as a Parquet file to my S3 bucket, however it does not create a table in the Data Catalog.

This is my code:

node hem-horarios-bpi hemhorariosbpi_node3 = glueContext.getSink(
path="s3://hem-data-datalake-staging/staging_general/staging_horarioHP/", connection_type="s3", updateBehavior="LOG", partitionKeys=[], enableUpdateCatalog=True, transformation_ctx="hemhorariosbpi_node3", ) 
hemhorariosbpi_node3.setCatalogInfo( catalogDatabase="hem-db-staging-tables", catalogTableName="hem-horarios-pbi" )
hemhorariosbpi_node3.setFormat("glueparquet") 
hemhorariosbpi_node3.writeFrame(S3bucket_node1) 
job.commit()

I have tried changing the role, but it won't do.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

陈甜 2025-01-21 14:25:47

尝试将 updateBehaviour 属性从 LOG 更改为 UPDATE_IN_DATABASE

Try changing the updateBehaviour property from LOG to UPDATE_IN_DATABASE.

云醉月微眠 2025-01-21 14:25:47

我能够通过使用上面的示例代码创建粘合作业来模拟这种情况,并成功执行该作业。数据已写入 S3,但未创建 Glue 表。 CloudWatch 日志也没有任何有关未创建表的原因的消息。将数据库上的 LakeFormation 权限授予粘合 IAM 角色,然后重新运行粘合作业。胶水表已创建。

缺少对数据库创建/更改/描述的 Lakeformation 权限。

I was able to simulate the situation by creating a glue job using the sample code above and successfully executed the job. Data was written to S3 but Glue table were not created. CloudWatch logs also didn't had any message about why the table was not created. Gave LakeFormation permission on the database to the glue IAM role and then re-run the glue job. Glue table was created.

Lakeformation permission on the database to create/alter/describe was missing.

人间☆小暴躁 2025-01-21 14:25:47

不知道你是否已经弄清楚原因了。我遇到了同样的问题,检查了Cloudwatch日志后发现这与Lake Formation有关。

如果您使用 Lake Formation 进行访问控制,则需要向传递给粘合作业的 IAM 角色授予必要的权限。

您可以查看 这篇文章

Not sure if you have already figured out the reason or not. I encountered the same issue and after checking the Cloudwatch logs, and it turns out that it has something to do with Lake Formation.

You need to grant the IAM role you pass to the glue job the necessary permission in Lake Formation if you are using it for access control.

You can take a look at this post.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文