dbt 快照失败(错误:100090 (42P18):DML 操作期间检测到重复行)
因此,我们有一个名为 dim_merchant.sql 的表和该表的一个名为 dim_merchant_snapshot 的快照。
{% snapshot dim_merchant_snapshot %}
{{
config
(
target_schema='snapshots',
unique_key='id',
strategy='check',
check_cols='all'
)
}}
select * from {{ ref('dim_merchant') }}
{% endsnapshot %}
我们从来没有遇到过麻烦,但从昨天开始,我们在运行快照时遇到了失败,并显示以下错误消息:
Database Error in snapshot dim_merchant_snapshot (snapshots/dim_merchant_snapshot.sql)
100090 (42P18): Duplicate row detected during DML action
在快照的这一步中发生错误:
On snapshot.analytics.dim_merchant_snapshot: merge into "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT" as DBT_INTERNAL_DEST
using "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT__dbt_tmp" as DBT_INTERNAL_SOURCE
on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id
when matched
and DBT_INTERNAL_DEST.dbt_valid_to is null
and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
then update
set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to
when not matched
and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
我们意识到某些值在快照中被插入和更新了两次(从昨天开始)这导致了我们的快照失败,但我们不确定为什么。
请注意,dim_merchant 上的 id 键经过了唯一性测试,并且没有重复。同时,在我们的第一次快照运行后,快照表包含重复项(这不会导致任何失败),但在受重复项感染的快照表上的后续运行会失败。
我们最近刚刚将 dbt 从 0.20.0 更新到 1.0.3,但我们没有发现这些版本之间的快照定义有任何变化。
设置:
dbt-core==1.0.3,
dbt-snowflake==1.0.0,
dbt-extractor==0.4.0,
Snowflake version: 6.7.1
谢谢!
So we have a table called dim_merchant.sql and a snapshot of this table called dim_merchant_snapshot.
{% snapshot dim_merchant_snapshot %}
{{
config
(
target_schema='snapshots',
unique_key='id',
strategy='check',
check_cols='all'
)
}}
select * from {{ ref('dim_merchant') }}
{% endsnapshot %}
We never had a trouble with it but since yesterday, we ran into failure in running the snapshot with the following error message:
Database Error in snapshot dim_merchant_snapshot (snapshots/dim_merchant_snapshot.sql)
100090 (42P18): Duplicate row detected during DML action
The error is happening during this step of the snapshot:
On snapshot.analytics.dim_merchant_snapshot: merge into "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT" as DBT_INTERNAL_DEST
using "X"."SNAPSHOTS"."DIM_MERCHANT_SNAPSHOT__dbt_tmp" as DBT_INTERNAL_SOURCE
on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id
when matched
and DBT_INTERNAL_DEST.dbt_valid_to is null
and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
then update
set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to
when not matched
and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
We realized that some values were being inserted and updated twice in the snapshot (since yesterday) and that caused the failure of our snapshot but we are not sure as to why.
Note that the id key on dim_merchant is tested for its uniqueness and there are no duplicated of it. Meanwhile, the snapshot table contains duplicate after our first snapshot run (that doesn't cause any failure), but the subsequent runs on the snapshot table infected with duplicates are failing.
We just recently updated dbt from 0.20.0 to 1.0.3, but we didnt find any change in the snapshot definition between these versions.
SETUP:
dbt-core==1.0.3,
dbt-snowflake==1.0.0,
dbt-extractor==0.4.0,
Snowflake version: 6.7.1
Thanks !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我知道这篇文章发布已经有一段时间了。我想报告说我也看到了这种奇怪的行为。我有一个使用
timestamp
策略创建快照的表。unique_key
由几列组成。目的是这是每次模型运行时的完整快照。正在快照的表具有所有唯一行,这意味着 dbt_scd_id 是唯一键。我通过将updated_at
列添加到unique_key
配置中解决了此问题。理论上,这应该不重要,因为dbt_scd_id
已经是unique_key
和updated_at
的串联。无论如何,它已经解决了问题。I know it's been a while since this was posted. I wanted to report that I'm seeing this weird behavior as well. I have a table that I'm snapshotting with the
timestamp
strategy. Theunique_key
is made up of several columns. The intention is that this is a full snapshot each time the model is run. The table that is being snapshotted has all unique rows meaningdbt_scd_id
is a unique key. I resolved this issue by adding theupdated_at
column to theunique_key
config. In theory, this shouldn't matter since thedbt_scd_id
is already a concatenation ofunique_key
andupdated_at
. Regardless, it has resolved the issue.