AWS Glue Studio 未创建表
因此,我一直在使用 AWS Glue Studio 来创建 ETL 作业。我当前将其设置为目标节点是 S3 存储桶,但我还想在数据目录上创建一个表。但是,一旦我运行 ETL 作业,它就不会标记错误,并且会正确地将输出作为 Parquet 文件保存到我的 S3 存储桶中,但它不会在数据目录中创建表。
这是我的代码:
node hem-horarios-bpi hemhorariosbpi_node3 = glueContext.getSink(
path="s3://hem-data-datalake-staging/staging_general/staging_horarioHP/", connection_type="s3", updateBehavior="LOG", partitionKeys=[], enableUpdateCatalog=True, transformation_ctx="hemhorariosbpi_node3", )
hemhorariosbpi_node3.setCatalogInfo( catalogDatabase="hem-db-staging-tables", catalogTableName="hem-horarios-pbi" )
hemhorariosbpi_node3.setFormat("glueparquet")
hemhorariosbpi_node3.writeFrame(S3bucket_node1)
job.commit()
我尝试过更改角色,但不行。
So I've been using AWS Glue Studio to create ETL jobs. I currently set it up so that the target node was an S3 bucket, however I also want to create a table on the Data Catalog. However, once I run the ETL job it doesn't mark an error and it does correctly save the output as a Parquet file to my S3 bucket, however it does not create a table in the Data Catalog.
This is my code:
node hem-horarios-bpi hemhorariosbpi_node3 = glueContext.getSink(
path="s3://hem-data-datalake-staging/staging_general/staging_horarioHP/", connection_type="s3", updateBehavior="LOG", partitionKeys=[], enableUpdateCatalog=True, transformation_ctx="hemhorariosbpi_node3", )
hemhorariosbpi_node3.setCatalogInfo( catalogDatabase="hem-db-staging-tables", catalogTableName="hem-horarios-pbi" )
hemhorariosbpi_node3.setFormat("glueparquet")
hemhorariosbpi_node3.writeFrame(S3bucket_node1)
job.commit()
I have tried changing the role, but it won't do.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试将 updateBehaviour 属性从
LOG
更改为UPDATE_IN_DATABASE
。Try changing the updateBehaviour property from
LOG
toUPDATE_IN_DATABASE
.我能够通过使用上面的示例代码创建粘合作业来模拟这种情况,并成功执行该作业。数据已写入 S3,但未创建 Glue 表。 CloudWatch 日志也没有任何有关未创建表的原因的消息。将数据库上的 LakeFormation 权限授予粘合 IAM 角色,然后重新运行粘合作业。胶水表已创建。
缺少对数据库创建/更改/描述的 Lakeformation 权限。
I was able to simulate the situation by creating a glue job using the sample code above and successfully executed the job. Data was written to S3 but Glue table were not created. CloudWatch logs also didn't had any message about why the table was not created. Gave LakeFormation permission on the database to the glue IAM role and then re-run the glue job. Glue table was created.
Lakeformation permission on the database to create/alter/describe was missing.
不知道你是否已经弄清楚原因了。我遇到了同样的问题,检查了Cloudwatch日志后发现这与Lake Formation有关。
如果您使用 Lake Formation 进行访问控制,则需要向传递给粘合作业的 IAM 角色授予必要的权限。
您可以查看 这篇文章。
Not sure if you have already figured out the reason or not. I encountered the same issue and after checking the Cloudwatch logs, and it turns out that it has something to do with Lake Formation.
You need to grant the IAM role you pass to the glue job the necessary permission in Lake Formation if you are using it for access control.
You can take a look at this post.