AWS ETL胶水工作
使用胶水ETL作业希望创建数据目录表和加载对象(如下所示)中的S3(分区)。表将是'data1's3
://test/data1/2022/03/22/1.csv s3://test/data1/2022/03/23/2.csv s3://test/data1/2022/04/08/1.csv s3://test/data1/2022/04/09/2.csv
s3bucket_node1 = gluecontext.create_dynamic_frame.from_options( format_options = {“ quodechar”:'“”,“ withheader”:false,“ saparator”:“”,“”}, Connection_Type =“ S3”, 格式=“ CSV”, Connection_options = { “路径”:[S3:// test/data1”], “ recurse”:是的, },, transformation_ctx =“ s3bucket_node1”,)
applymapping_node2 = applymapping.apply( frame = s3bucket_node1,mappings = [],transformation_ctx =“ applymapping_node2”)
datacatalogtable_node3 = glueContext.write_dynamic_frame.from_catalog( 帧= applymapping_node2, 数据库=“默认”, table_name =“ data1”, 附加_options = { “ enableupdatecatalog”:是的, “ UpdateBehavior”:“ Update_in_database”, “ PartitionKeys”:[“ partition_0”,“ partition_1”,“ partition_2”], },, transformation_ctx =“ dataCatalogtable_node3”,)*
py4j.protocol.py4jjavaerror:调用o81.getCatalogSink时发生了错误。 :com.amazonaws.services.glue.model.entitynotfoundexception:table pk_datacdr_new找不到。 (服务:AWSGLUE;状态代码:400;错误代码:EntityNotFoundException;请求ID:07A1EC53-2ADE-4B9F-A23F-36564DDE19D8; PROXY:NULL:NULL)
ETL GLUE工作失败,存在。有没有办法创建表,如果不存在,请告诉我?
Using GLUE ETL jobs would like to create data catalog table and load objects which are in s3 (partitioned) like below. Table would be 'data1'
s3://test/data1/2022/03/22/1.csv
s3://test/data1/2022/03/23/2.csv
s3://test/data1/2022/04/08/1.csv
s3://test/data1/2022/04/09/2.csv
S3bucket_node1 = glueContext.create_dynamic_frame.from_options(
format_options={"quoteChar": '"', "withHeader": False, "separator": ","},
connection_type="s3",
format="csv",
connection_options={
"paths": ["s3://test/data1"],
"recurse": True,
},
transformation_ctx="S3bucket_node1", )ApplyMapping_node2 = ApplyMapping.apply(
frame=S3bucket_node1, mappings=[], transformation_ctx="ApplyMapping_node2" )DataCatalogtable_node3 = glueContext.write_dynamic_frame.from_catalog(
frame=ApplyMapping_node2,
database="default",
table_name="data1",
additional_options={
"enableUpdateCatalog": True,
"updateBehavior": "UPDATE_IN_DATABASE",
"partitionKeys": ["partition_0", "partition_1", "partition_2"],
},
transformation_ctx="DataCatalogtable_node3", )*
py4j.protocol.Py4JJavaError: An error occurred while calling o81.getCatalogSink.
: com.amazonaws.services.glue.model.EntityNotFoundException: Table pk_datacdr_new not found. (Service: AWSGlue; Status Code: 400; Error Code: EntityNotFoundException; Request ID: 07a1ec53-2ade-4b9f-a23f-36564dde19d8; Proxy: null)
ETL glue job fails as the script expects table to be existing. Is there way to create table if not exists, please let me know?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论