AWS ETL胶水工作

发布于 2025-01-26 10:21:28 字数 1308 浏览 2 评论 0原文

使用胶水ETL作业希望创建数据目录表和加载对象(如下所示)中的S3(分区)。表将是'data1's3

://test/data1/2022/03/22/1.csv s3://test/data1/2022/03/23/2.csv s3://test/data1/2022/04/08/1.csv s3://test/data1/2022/04/09/2.csv

s3bucket_node1 = gluecontext.create_dynamic_frame.from_options( format_options = {“ quodechar”:'“”,“ withheader”:false,“ saparator”:“”,“”}, Connection_Type =“ S3”, 格式=“ CSV”, Connection_options = { “路径”:[S3:// test/data1”], “ recurse”:是的, },, transformation_ctx =“ s3bucket_node1”,)

applymapping_node2 = applymapping.apply( frame = s3bucket_node1,mappings = [],transformation_ctx =“ applymapping_node2”)

datacatalogtable_node3 = glueContext.write_dynamic_frame.from_catalog( 帧= applymapping_node2, 数据库=“默认”, table_name =“ data1”, 附加_options = { “ enableupdatecatalog”:是的, “ UpdateBehavior”:“ Update_in_database”, “ PartitionKeys”:[“ partition_0”,“ partition_1”,“ partition_2”], },, transformation_ctx =“ dataCatalogtable_node3”,)*

py4j.protocol.py4jjavaerror:调用o81.getCatalogSink时发生了错误。 :com.amazonaws.services.glue.model.entitynotfoundexception:table pk_datacdr_new找不到。 (服务:AWSGLUE;状态代码:400;错误代码:EntityNotFoundException;请求ID:07A1EC53-2ADE-4B9F-A23F-36564DDE19D8; PROXY:NULL:NULL)

ETL GLUE工作失败,存在。有没有办法创建表,如果不存在,请告诉我?

Using GLUE ETL jobs would like to create data catalog table and load objects which are in s3 (partitioned) like below. Table would be 'data1'

s3://test/data1/2022/03/22/1.csv
s3://test/data1/2022/03/23/2.csv
s3://test/data1/2022/04/08/1.csv
s3://test/data1/2022/04/09/2.csv

S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
format_options={"quoteChar": '"', "withHeader": False, "separator": ","},
connection_type="s3",
format="csv",
connection_options={
"paths": ["s3://test/data1"],
"recurse": True,
},
transformation_ctx="S3bucket_node1", )

ApplyMapping_node2 = ApplyMapping.apply(
frame=S3bucket_node1, mappings=[], transformation_ctx="ApplyMapping_node2" )

DataCatalogtable_node3 = glueContext.write_dynamic_frame.from_catalog(
frame=ApplyMapping_node2,
database="default",
table_name="data1",
additional_options={
"enableUpdateCatalog": True,
"updateBehavior": "UPDATE_IN_DATABASE",
"partitionKeys": ["partition_0", "partition_1", "partition_2"],
},
transformation_ctx="DataCatalogtable_node3", )*

py4j.protocol.Py4JJavaError: An error occurred while calling o81.getCatalogSink.
: com.amazonaws.services.glue.model.EntityNotFoundException: Table pk_datacdr_new not found. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 07a1ec53-2ade-4b9f-a23f-36564dde19d8; Proxy: null)

ETL glue job fails as the script expects table to be existing. Is there way to create table if not exists, please let me know?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文