AWS胶水预反应:无效操作:关系“ stage_table”不存在

发布于 2025-01-31 17:19:08 字数 2483 浏览 3 评论 0原文

我正在尝试利用AWS胶水动态框架作者的预反应和后actions来执行红移UPSERT合并,如下所述: https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-shift-shift-glue-job/

我几乎只是复制了/将代码从该页面粘贴到我自己的胶水作业中,除了我使用glueContext.write_dynamic_frame.from_options()而不是gluecontext.write_dynamic_frame.from_jdbc_conf()

它几乎有效,但是每次我运行工作时,都会在动态框架的编写过程中出现错误:

亚马逊无效的操作:关系“ stage_table”不存在

是导致此错误的代码(为简单起见DB表/列名已更改):

pre_query = 'begin; \
    drop table if exists stage_table; \
    create table stage_table as select * from target_table where 1=2; \
    end;'
post_query = 'begin; \
    delete from target_table using stage_table \
        where target_table.identifier = stage_table.identifier; \
    insert into target_table select * from stage_table; \
    drop table stage_table; \
    end;'
glueContext.write_dynamic_frame.from_options(
    frame=dy_out,
    connection_type='redshift',
    connection_options={
        'url': dburl,
        'preactions': pre_query,
        'dbtable': 'stage_table',
        'database': 'public',
        'user': dbuser,
        'password': dbpassword,
        'redshiftTmpDir': args['TempDir'],
        'postactions': post_query
    }
)

看来,错误消息来自call to Write_dynamic_frame.from_options()。实际上,如果我在数据库中手动添加一个stage_table,然后运行该作业后,就不再存在stage_table。因此,似乎滴度表正在工作,但是创建表不可能,也许是由于某种胶水正时问题所致。

请注意,目前,我已经通过以下方式解决了这个问题:

1. creating a permanent version of the stage_table in my redshift db,
2. changing my preaction to just truncate the stage_table instead of dropping and creating it, and
3. changing my postaction to not drop the stage_table.

换句话说,以下代码确实有效,证明了前动作和后行动的功能。但是我宁愿不会在我的DB中永久保留舞台桌。

pre_query = 'truncate table stage_table;'
post_query = 'begin; \
    delete from target_table using stage_table \
        where stage_table.identifier = target_table.identifier; \
    insert into target_table select * from stage_table; \
    end;'
glueContext.write_dynamic_frame.from_options(
    frame=dy_out,
    connection_type='redshift',
    connection_options={
        'url': dburl,
        'preactions': pre_query,
        'dbtable': 'stage_table',
        'database': 'public',
        'user': dbuser,
        'password': dbpassword,
        'redshiftTmpDir': args['TempDir'],
        'postactions': post_query
    }
)

I'm trying to utilize AWS Glue dynamic frame writer preactions and postactions to perform a redshift upsert merge as described here: https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-glue-job/

I've pretty much just copy/pasted the code from that page into my own Glue job, except I'm using glueContext.write_dynamic_frame.from_options() instead of glueContext.write_dynamic_frame.from_jdbc_conf().

It almost works, but everytime I run the job, it errors out during the writing of the dynamic frame with this error:

Amazon Invalid operation: relation "stage_table" does not exist

Here is the code that causes this error (the db table/column names have been changed for simplicity):

pre_query = 'begin; \
    drop table if exists stage_table; \
    create table stage_table as select * from target_table where 1=2; \
    end;'
post_query = 'begin; \
    delete from target_table using stage_table \
        where target_table.identifier = stage_table.identifier; \
    insert into target_table select * from stage_table; \
    drop table stage_table; \
    end;'
glueContext.write_dynamic_frame.from_options(
    frame=dy_out,
    connection_type='redshift',
    connection_options={
        'url': dburl,
        'preactions': pre_query,
        'dbtable': 'stage_table',
        'database': 'public',
        'user': dbuser,
        'password': dbpassword,
        'redshiftTmpDir': args['TempDir'],
        'postactions': post_query
    }
)

It appears that the error message comes from the call to write_dynamic_frame.from_options(). In fact, if I manually add a stage_table to my database and then run the job, when it's finished, the stage_table no longer exists. So it seems like the preactions drop table is working, but the create table is not, perhaps due to some sort of glue timing issue.

Note that I have, for the moment, hacked my way around this problem by:

1. creating a permanent version of the stage_table in my redshift db,
2. changing my preaction to just truncate the stage_table instead of dropping and creating it, and
3. changing my postaction to not drop the stage_table.

In other words, the following code DOES WORK, proving the functionality of the pre and post actions. But I would rather not have the stage tables residing permanently in my db.

pre_query = 'truncate table stage_table;'
post_query = 'begin; \
    delete from target_table using stage_table \
        where stage_table.identifier = target_table.identifier; \
    insert into target_table select * from stage_table; \
    end;'
glueContext.write_dynamic_frame.from_options(
    frame=dy_out,
    connection_type='redshift',
    connection_options={
        'url': dburl,
        'preactions': pre_query,
        'dbtable': 'stage_table',
        'database': 'public',
        'user': dbuser,
        'password': dbpassword,
        'redshiftTmpDir': args['TempDir'],
        'postactions': post_query
    }
)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文