当源查看列更改时，如何防止DBT重建增量表？

发布于 2025-01-27 15:38:21 字数 804 浏览 2 评论 0原文

我们在DBT中具有以下结构：

[events (view), with column1, column2, column3]‒‒‒>[incremental1, using column1 from events]
                                             |
                                             |
                                             └‒‒‒‒>[incremental2, using column2 from events]

在我们的连续集成设置中，我们运行dbt运行 - 模型状态：修改+，以避免运行在PR中未更改的模型。但是，问题在于，如果我们在中更改column3在events（例如重命名IT），则两个增量表，crememental1和/code>将进行完全重建，这可能需要花费太长的时间，有时甚至一整天。请注意，由于不以任何方式使用更改的列，因此这两个增量表实际上都会受到其源的更改的影响。

有没有办法避免重建这些模型，同时也不必运行所有其他模型（这是简单的dbt Run会做的）？

一些其他详细信息：

我们的DBT版本是1.0，
我们的数据仓库引擎是雪花，
我们的增量策略是'delete+insert'

原文

We have the following structure in dbt:

[events (view), with column1, column2, column3]‒‒‒>[incremental1, using column1 from events]
                                             |
                                             |
                                             └‒‒‒‒>[incremental2, using column2 from events]

In our continuous integration setup, we run dbt run --models state:modified+ to avoid running models that were not changed in a PR. However, the problem is that if we change column3 in events (e.g. rename it), both incremental tables, incremental1 and incremental2, will be fully rebuilt which may take an excessively long time, sometimes more than a whole day. Note that neither incremental tables are actually affected by the change in their source as they do not use the changed column in any way.

Is there a way to avoid rebuilding these models while also not having to run all the other models (which is what a simple dbt run would do)?

Some additional details:

our dbt version is 1.0
our data warehouse engine is Snowflake
our incremental strategy is 'delete+insert'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蛮可爱 2025-02-03 15:38:21

dbt Run采用- Dublude选项；您可以按名称排除那些长期运行的桌子，也可以标记它们。显然，这承担了自己的风险和维护负担。

另一个选择是在开发或CI运行中过滤这些大表中的记录，以便它们更快地构建。您可以将类似的内容添加到增量模型中：

{% if target.name != 'prod' %}
where updated_at > current_timestamp() - interval '30 days'
{% endif %}

dbt run takes an --exclude option; you could exclude those long-running tables by name, or tag them. This obviously carries its own risks and maintenance burden.

Another option would be to filter the records in these large tables on a dev or CI run so that they build much more quickly. You could add something like this to the incremental model:

{% if target.name != 'prod' %}
where updated_at > current_timestamp() - interval '30 days'
{% endif %}

回复收藏 0 原文

~没有更多了~

关于作者

掐死时间

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

当源查看列更改时，如何防止DBT重建增量表？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

当源查看列更改时，如何防止DBT重建增量表？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。