dbt 快照失败(错误:100090 (42P18):DML 操作期间检测到重复行)

发布于 2025-01-15 05:33:08 字数 1552 浏览 3 评论 0原文

因此,我们有一个名为 dim_merchant.sql 的表和该表的一个名为 dim_merchant_snapshot 的快照。

{% snapshot dim_merchant_snapshot %}

    {{
        config
        (
          target_schema='snapshots',
          unique_key='id',
          strategy='check',
          check_cols='all'
        )
    }}

select * from {{ ref('dim_merchant') }}

{% endsnapshot %}

我们从来没有遇到过麻烦,但从昨天开始,我们在运行快照时遇到了失败,并显示以下错误消息:

Database Error in snapshot dim_merchant_snapshot (snapshots/dim_merchant_snapshot.sql)
100090 (42P18): Duplicate row detected during DML action

在快照的这一步中发生错误:

On snapshot.analytics.dim_merchant_snapshot: merge into "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT" as DBT_INTERNAL_DEST
    using "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT__dbt_tmp" as DBT_INTERNAL_SOURCE
    on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id

    when matched
     and DBT_INTERNAL_DEST.dbt_valid_to is null
     and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
        then update
        set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to

    when not matched
     and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'

我们意识到某些值在快照中被插入和更新了两次(从昨天开始)这导致了我们的快照失败,但我们不确定为什么。

请注意,dim_merchant 上的 id 键经过了唯一性测试,并且没有重复。同时,在我们的第一次快照运行后,快照表包含重复项(这不会导致任何失败),但在受重复项感染的快照表上的后续运行会失败。

我们最近刚刚将 dbt 从 0.20.0 更新到 1.0.3,但我们没有发现这些版本之间的快照定义有任何变化。

设置:

dbt-core==1.0.3,
dbt-snowflake==1.0.0,
dbt-extractor==0.4.0,
Snowflake version: 6.7.1 

谢谢!

So we have a table called dim_merchant.sql and a snapshot of this table called dim_merchant_snapshot.

{% snapshot dim_merchant_snapshot %}

    {{
        config
        (
          target_schema='snapshots',
          unique_key='id',
          strategy='check',
          check_cols='all'
        )
    }}

select * from {{ ref('dim_merchant') }}

{% endsnapshot %}

We never had a trouble with it but since yesterday, we ran into failure in running the snapshot with the following error message:

Database Error in snapshot dim_merchant_snapshot (snapshots/dim_merchant_snapshot.sql)
100090 (42P18): Duplicate row detected during DML action

The error is happening during this step of the snapshot:

On snapshot.analytics.dim_merchant_snapshot: merge into "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT" as DBT_INTERNAL_DEST
    using "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT__dbt_tmp" as DBT_INTERNAL_SOURCE
    on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id

    when matched
     and DBT_INTERNAL_DEST.dbt_valid_to is null
     and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
        then update
        set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to

    when not matched
     and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'

We realized that some values were being inserted and updated twice in the snapshot (since yesterday) and that caused the failure of our snapshot but we are not sure as to why.

Note that the id key on dim_merchant is tested for its uniqueness and there are no duplicated of it. Meanwhile, the snapshot table contains duplicate after our first snapshot run (that doesn't cause any failure), but the subsequent runs on the snapshot table infected with duplicates are failing.

We just recently updated dbt from 0.20.0 to 1.0.3, but we didnt find any change in the snapshot definition between these versions.

SETUP:

dbt-core==1.0.3,
dbt-snowflake==1.0.0,
dbt-extractor==0.4.0,
Snowflake version: 6.7.1 

Thanks !

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

万人眼中万个我 2025-01-22 05:33:08

我知道这篇文章发布已经有一段时间了。我想报告说我也看到了这种奇怪的行为。我有一个使用 timestamp 策略创建快照的表。 unique_key 由几列组成。目的是这是每次模型运行时的完整快照。正在快照的表具有所有唯一行,这意味着 dbt_scd_id 是唯一键。我通过将 updated_at 列添加到 unique_key 配置中解决了此问题。理论上,这应该不重要,因为 dbt_scd_id 已经是 unique_keyupdated_at 的串联。无论如何,它已经解决了问题。

I know it's been a while since this was posted. I wanted to report that I'm seeing this weird behavior as well. I have a table that I'm snapshotting with the timestamp strategy. The unique_key is made up of several columns. The intention is that this is a full snapshot each time the model is run. The table that is being snapshotted has all unique rows meaning dbt_scd_id is a unique key. I resolved this issue by adding the updated_at column to the unique_key config. In theory, this shouldn't matter since the dbt_scd_id is already a concatenation of unique_key and updated_at. Regardless, it has resolved the issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文