AWS Glue Studio 未创建表

发布于 2025-01-14 14:25:47 字数 716 浏览 3 评论 0原文

因此，我一直在使用 AWS Glue Studio 来创建 ETL 作业。我当前将其设置为目标节点是 S3 存储桶，但我还想在数据目录上创建一个表。但是，一旦我运行 ETL 作业，它就不会标记错误，并且会正确地将输出作为 Parquet 文件保存到我的 S3 存储桶中，但它不会在数据目录中创建表。

这是我的代码：

node hem-horarios-bpi hemhorariosbpi_node3 = glueContext.getSink(
path="s3://hem-data-datalake-staging/staging_general/staging_horarioHP/", connection_type="s3", updateBehavior="LOG", partitionKeys=[], enableUpdateCatalog=True, transformation_ctx="hemhorariosbpi_node3", ) 
hemhorariosbpi_node3.setCatalogInfo( catalogDatabase="hem-db-staging-tables", catalogTableName="hem-horarios-pbi" )
hemhorariosbpi_node3.setFormat("glueparquet") 
hemhorariosbpi_node3.writeFrame(S3bucket_node1) 
job.commit()

我尝试过更改角色，但不行。

原文

So I've been using AWS Glue Studio to create ETL jobs. I currently set it up so that the target node was an S3 bucket, however I also want to create a table on the Data Catalog. However, once I run the ETL job it doesn't mark an error and it does correctly save the output as a Parquet file to my S3 bucket, however it does not create a table in the Data Catalog.

This is my code:

node hem-horarios-bpi hemhorariosbpi_node3 = glueContext.getSink(
path="s3://hem-data-datalake-staging/staging_general/staging_horarioHP/", connection_type="s3", updateBehavior="LOG", partitionKeys=[], enableUpdateCatalog=True, transformation_ctx="hemhorariosbpi_node3", ) 
hemhorariosbpi_node3.setCatalogInfo( catalogDatabase="hem-db-staging-tables", catalogTableName="hem-horarios-pbi" )
hemhorariosbpi_node3.setFormat("glueparquet") 
hemhorariosbpi_node3.writeFrame(S3bucket_node1) 
job.commit()

I have tried changing the role, but it won't do.

分享到QQ

分享到微博