AWS胶水预反应:无效操作:关系“ stage_table”不存在
我正在尝试利用AWS胶水动态框架作者的预反应和后actions来执行红移UPSERT合并,如下所述: https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-shift-shift-glue-job/
我几乎只是复制了/将代码从该页面粘贴到我自己的胶水作业中,除了我使用glueContext.write_dynamic_frame.from_options()
而不是gluecontext.write_dynamic_frame.from_jdbc_conf()
。
它几乎有效,但是每次我运行工作时,都会在动态框架的编写过程中出现错误:
亚马逊无效的操作:关系“ stage_table”不存在
是导致此错误的代码(为简单起见DB表/列名已更改):
pre_query = 'begin; \
drop table if exists stage_table; \
create table stage_table as select * from target_table where 1=2; \
end;'
post_query = 'begin; \
delete from target_table using stage_table \
where target_table.identifier = stage_table.identifier; \
insert into target_table select * from stage_table; \
drop table stage_table; \
end;'
glueContext.write_dynamic_frame.from_options(
frame=dy_out,
connection_type='redshift',
connection_options={
'url': dburl,
'preactions': pre_query,
'dbtable': 'stage_table',
'database': 'public',
'user': dbuser,
'password': dbpassword,
'redshiftTmpDir': args['TempDir'],
'postactions': post_query
}
)
看来,错误消息来自call to Write_dynamic_frame.from_options()。实际上,如果我在数据库中手动添加一个stage_table,然后运行该作业后,就不再存在stage_table。因此,似乎滴度表正在工作,但是创建表不可能,也许是由于某种胶水正时问题所致。
请注意,目前,我已经通过以下方式解决了这个问题:
1. creating a permanent version of the stage_table in my redshift db,
2. changing my preaction to just truncate the stage_table instead of dropping and creating it, and
3. changing my postaction to not drop the stage_table.
换句话说,以下代码确实有效,证明了前动作和后行动的功能。但是我宁愿不会在我的DB中永久保留舞台桌。
pre_query = 'truncate table stage_table;'
post_query = 'begin; \
delete from target_table using stage_table \
where stage_table.identifier = target_table.identifier; \
insert into target_table select * from stage_table; \
end;'
glueContext.write_dynamic_frame.from_options(
frame=dy_out,
connection_type='redshift',
connection_options={
'url': dburl,
'preactions': pre_query,
'dbtable': 'stage_table',
'database': 'public',
'user': dbuser,
'password': dbpassword,
'redshiftTmpDir': args['TempDir'],
'postactions': post_query
}
)
I'm trying to utilize AWS Glue dynamic frame writer preactions and postactions to perform a redshift upsert merge as described here: https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-glue-job/
I've pretty much just copy/pasted the code from that page into my own Glue job, except I'm using glueContext.write_dynamic_frame.from_options()
instead of glueContext.write_dynamic_frame.from_jdbc_conf()
.
It almost works, but everytime I run the job, it errors out during the writing of the dynamic frame with this error:
Amazon Invalid operation: relation "stage_table" does not exist
Here is the code that causes this error (the db table/column names have been changed for simplicity):
pre_query = 'begin; \
drop table if exists stage_table; \
create table stage_table as select * from target_table where 1=2; \
end;'
post_query = 'begin; \
delete from target_table using stage_table \
where target_table.identifier = stage_table.identifier; \
insert into target_table select * from stage_table; \
drop table stage_table; \
end;'
glueContext.write_dynamic_frame.from_options(
frame=dy_out,
connection_type='redshift',
connection_options={
'url': dburl,
'preactions': pre_query,
'dbtable': 'stage_table',
'database': 'public',
'user': dbuser,
'password': dbpassword,
'redshiftTmpDir': args['TempDir'],
'postactions': post_query
}
)
It appears that the error message comes from the call to write_dynamic_frame.from_options(). In fact, if I manually add a stage_table to my database and then run the job, when it's finished, the stage_table no longer exists. So it seems like the preactions drop table is working, but the create table is not, perhaps due to some sort of glue timing issue.
Note that I have, for the moment, hacked my way around this problem by:
1. creating a permanent version of the stage_table in my redshift db,
2. changing my preaction to just truncate the stage_table instead of dropping and creating it, and
3. changing my postaction to not drop the stage_table.
In other words, the following code DOES WORK, proving the functionality of the pre and post actions. But I would rather not have the stage tables residing permanently in my db.
pre_query = 'truncate table stage_table;'
post_query = 'begin; \
delete from target_table using stage_table \
where stage_table.identifier = target_table.identifier; \
insert into target_table select * from stage_table; \
end;'
glueContext.write_dynamic_frame.from_options(
frame=dy_out,
connection_type='redshift',
connection_options={
'url': dburl,
'preactions': pre_query,
'dbtable': 'stage_table',
'database': 'public',
'user': dbuser,
'password': dbpassword,
'redshiftTmpDir': args['TempDir'],
'postactions': post_query
}
)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论