在星型模式表设计中包含关系有什么好处吗?
我正在为当前使用 SQL Server、SSIS 和 SSAS 的数据仓库设计事实表和维度表。 将维度和事实表之间的关系编程为 SQL 会带来任何真正的好处吗? 或者,在创建多维数据集时,我是否最好手动定义关系?
如果我对将数据插入表中没有任何限制并因此忽略关系,则加载和转换数据似乎会更容易。
I'm designing the Fact and Dimension tables for a data warehouse currently using SQL Server, SSIS, and SSAS. Will I get any real benefit from programming the relationships between the dimensions and the fact tables into SQL? Or am I better off just defining the relationships manually when it comes time to create the cubes?
It seems to be easier to load and transform the data if I do not have any constraints upon my data insertion into the tables and therefore leave out the relationships.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将“对关系进行编程”解释为在表上放置外键约束。
不,在数据仓库中,您不应该对事实表施加主键或外键约束。
您提到了一些问题,另一个问题是这些约束在插入行时带来了性能开销,这将使 ETL 过程更加昂贵。
对于仅具有事务数据库设计经验的人来说,这可能与他们所学到和经历的一切背道而驰。 外键约束对于有多个进程同时修改数据的数据库至关重要。 尽管开发人员尽了最大努力,但两个进程确实存在以某种方式搞砸数据的风险。 这些限制从根本上来说是一个至关重要的安全网。
在维度模型中,数据库仅由一个 ETL 过程以高度受控的方式填充。 这显着降低了数据损坏的风险,以至于额外的约束成本不值得。
I am interpreting "programming the relationships" as meaning to put foreign key constraints on the tables.
No, in a data warehouse you should not impose primary key or foreign key constraints on the fact tables.
You've mentioned some issues, and another problem is that these constraints place a performance overhead when inserting rows, which will make the ETL process more expensive.
To someone only experienced with transactional database design, this might go against everything they've learnt and experienced. Foreign key constraints are vital for databases where you have multiple processes modifying data at the same time. There's a definite risk of two processes screwing up the data somehow, in spite of the best efforts of developers. The constraints are a fundamentally vital safety net.
In a dimensional model, the database is only ever populated by the one ETL process, and in a highly controlled way. This significantly decreases the risk of the data getting corrupted, to the point where the extra cost of constraints just aren't worth it.
我认为我们需要有 FK 约束,因为 DW 的更新“大部分”是受控制的,但并非总是如此。 例如,如果出现任何数据问题等,就会进行手动数据修复。 [理想情况下这一定不会发生,但是....:)]
为了确保密钥不会影响性能,我们可以在加载之前禁用它们并再次启用它们。 这可以让我们确信数据是正确的,并且还可以消除加载期间的任何性能问题。 另一件要记住的事情是,处理时间并不是大多数数据仓库的主要限制。
如果考虑到修复潜在数据完整性问题所需的时间,那么拥有 FK 是非常值得的。
I think we need to have FK Constraints, since updates to DW is 'mostly' controlled but not always. For example, manual data fixes happen in case of any data issues and such. [Ideally this must not happen, but....:)]
To ensure, the Keys dont impact performance, we can disable them before load and enable them again. This may give us the confidence that the data is right and also remove any performance issues during the load. Another thing to remember is, processing times are not a major constraint for most Data Warehouses.
If you consider the time needed to fix potential data integrity issues, having FK is well worth.