DW表中的主要和外键
我已经阅读该尺寸表包含主键,事实表包含外键,该键引用了尺寸表的主要键。
现在,我的混乱就是这样 - 假设我有一个ETL管道,该管道从源中填充了尺寸表(例如客户)(例如另一个DB)。假设这是一个经常更换的表,并具有200多列。如何将这些更改纳入尺寸表?我只想在DWH中拥有每个客户(1型SCD)的最新记录。
我能做的一件事是在维度表中删除该行并重新插入新的更新的行。但是,由于主要键 - 外键约束(这不允许我删除记录),因此这种方法无法使用。
我应该在ETL脚本中编写所有200列的更新语句吗?还是还有其他方法?
I've read that dimension tables hold the primary key and and fact tables contain the foreign key which references the primary key of Dimension tables.
Now the confusion I am having is this - suppose I have an ETL pipeline which populates the dimension table (let's say customer) from a source (say another DB). Let's assume this is a frequently changing table and has over 200 columns. How do I incorporate these changes in the dimension tables? I want to have only the latest record for each customer (type 1 SCD) in the DWH.
One thing what I could do is delete the row in the dimension table and re-insert the new updated row. But this approach won't work because of the primary key - foreign key constraint (which will not allow me to delete the record).
Should I write an update statement with all 200 columns in the ETL script? Or is there any other approach?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
严格来说,您只需要更新更改的字段即可。但是,在单个记录中更新所有内容的成本可能相似(假设其基于行的存储),并且可以更容易编写。
您无法删除并重新插入,因为新行将有一个新的PK,并且旧事实将不再链接。
Strictly speaking you just need to update the fields that changed. But the cost of updating all in a single record is probably similar (assuming it’s row based storage), and it’s probably easier to write.
You can’t delete and re-insert, as the new row will have a new PK and old facts will no longer be linked.