在数据进入暂存表之前是否应该处理维度表键查找?

发布于 2024-11-01 08:50:32 字数 117 浏览 6 评论 0原文

我读过,临时表应该是 dw 中目标表的精确逐列匹配。如果是这种情况,在填充临时表后,最佳做法是不进行后续查找以将键与维度表中的键进行匹配吗?

我想我的问题是,在数据进入临时表之前是否应该处理维度表键查找?

I have read that a staging table should be an exact column by column match of its target table in the dw. If that is the case, after populating the staging table is it best practice to not do subsequent lookups to match up keys to those in dimension tables?

My question I guess is this, should dimension table key lookups be processed before data goes into a staging table?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

つ低調成傷 2024-11-08 08:50:32

最佳实践是不受影响地填充暂存数据,但这并不意味着您不能添加其他元数据列。只要暂存数据完全可追溯到未转换的源,您就可以添加任何代理键或其他特定于 ETL 的数据,例如提取时间(如果您愿意)。

数据仓库的正常做法是按原样填充暂存数据(提取),然后在更新插入到 SCD 时查找维度键。暂存数据不应真正包含特定于仓库的信息,因为它可用于多种目的。

It is best practice to populate staging data untouched, but that doesn't mean you cannot add additional metadata columns. As long as the staging data is fully traceable back to source untransformed, you can add any surrogate keys or other ETL-specific data, such as extract time as an example, if you wish.

The normal practice in data warehousing is to populate staging data as-is (extract), then lookup the dimensional keys when upserting to a SCD. The staging data shouldn't really contain warehouse-specific information as it could be used for many purposes.

数理化全能战士 2024-11-08 08:50:32

我很想知道你在哪里读到的。没有硬性规定,但大多数临时表都是所需源数据的原始转储,有时会执行一些基本的清理。此时我会避免对您的数据仓库进行任何查找。当您将数据从暂存移动到仓库时,应在流程中进行查找。

I'd be interested to know where you read that. There is no hard and fast rule, but most staging tables are a raw dump of the required source data, sometimes with some basic cleansing performed. I'd steer well clear of doing any lookups to your data warehouse at this point. The lookups should happen in the process when you move the data from Staging to your warehouse.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文