在星型模式中,事实和维度之间的外键约束是否必要?
我第一次接触数据仓库,我想知道事实和维度之间是否有必要有外键约束。没有它们有什么主要缺点吗?我目前正在使用关系星型模式。在传统应用程序中,我习惯了它们,但我开始怀疑在这种情况下是否需要它们。我目前在 SQL Server 2005 环境中工作。
更新:对于那些感兴趣的人,我遇到了一个询问同样问题的民意调查。
I'm getting my first exposure to data warehousing, and I’m wondering is it necessary to have foreign key constraints between facts and dimensions. Are there any major downsides for not having them? I’m currently working with a relational star schema. In traditional applications I’m used to having them, but I started to wonder if they were needed in this case. I’m currently working in a SQL Server 2005 environment.
UPDATE: For those interested I came across a poll asking the same question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
大多数数据仓库 (DW) 没有将外键实现为约束,因为:
一般来说,外键约束会在以下情况下触发:向事实表中插入、任何键更新以及从维度表中删除。
在加载期间,删除索引和约束以加快加载过程,数据完整性由 ETL 应用程序强制执行。
在加载期间,删除索引和约束
一旦加载表,DW 本质上是只读的 - 约束不会在读取时触发。
加载后重新构建任何所需的索引。
在 DW 中删除是一个受控过程。在从维度中删除行之前,会在事实表中查询要删除的行的键 - 仅当这些键不存在于任何事实表中时才允许删除。
为了以防万一,定期运行查询来检测事实表中的孤立记录是很常见的。
Most data-warehouses (DW) do not have foreign keys implemented as constraints, because:
In general, foreign key constraint would trigger on: an insert into a fact table, any key-updates, and a delete from a dimension table.
During loading, indexes and constraints are dropped to speed-up the loading process, data integrity is enforced by the ETL application.
Once tables are loaded, DW is essentially read-only -- the constraint does not trigger on reads.
Any required indexes are re-built after the loading.
Deleting in a DW is a controlled process. Before deleting rows from dimensions, fact tables are queried for keys of rows to be deleted -- deleting is allowed only if those keys do not exists in any of fact tables.
Just in case, it is common to periodically run queries to detect orphan records in fact tables.
我们使用它们,并且对此感到满意。
这是好的做法吗在数据仓库(关系)中拥有外键?
存在开销,但您始终可以在加载期间禁用约束,然后重新启用它。
制定适当的约束可以捕获 ETL 错误和建模缺陷。
We use them, and we're happy with it.
Is it good practice to have foreign keys in a datawarehouse (relationships)?
There is overhead, but you can always disable the constraint during load and then re-enable it.
Having the constraint in place can catch ETL bugs and modelling defects.
我认为理论上,你需要这个。但这取决于您如何通过数据库分离数据。如果它们都在同一个数据库中,外键可以帮助您,因为设置外键将帮助数据库根据索引更快地进行选择。如果您在多个数据库上共享表,则需要在应用程序级别进行检查。
您可以让数据库为您检查,但速度可能很慢。一般来说,在数据仓库中,我们不关心冗余或完整性。我们已经有很多数据了,少量的完整性和冗余不会影响一般聚合数据
I think in theory, you need that. But it depends on how you separate your data over database. If all of them in the same database, foreign key can help you because setting foreign key will help the database do selecting faster based on the indexing. If you share tables over many database, you need to check it on your application level
You can have your database check it for you but it can be slow. And generally, in data warehouse, we don't care about redundancy or integrity. We already have a lot of data and a few integrity and redundancy will not affect the general aggregate data
我不知道有必要,但我觉得它们对于数据完整性很有好处。您希望确保事实表始终指向维度表中的有效记录。即使您确定会发生这种情况,为什么不让数据库为您验证需求呢?
I don't know about necessary, but I feel they are good for data integrity reasons. You want to make sure that your fact table is always pointing to a valid record in the dimension table. Even if you are sure this will happen, why not have the database validate the requirement for you?
在数据仓库中使用完整性约束的原因与在任何其他数据库中完全相同:保证数据的完整性。假设您和您的用户关心数据的准确性,那么您需要某种方法来确保数据保持准确并且正确应用业务规则。
The reasons for using integrity constraints in a data warehouse are exactly the same as in any other database: to guarantee the integrity of the data. Assuming you and your users care about the data being accurate then you need some way of ensuring that it remains so and that business rules are being correctly applied.
据我所知FKs,加快查询速度。此外,许多 BI 解决方案在其集成层中利用它们。所以对我来说它们是 DW 中的必备品。
As far as I know FKs, speed up queries. Also, many BI solutions exploit them in their integration layer. So for me they are a must in DWs.
希望这个帖子仍然活跃。
我的想法是:对于具有许多维度和记录的大型事实表,外键会减慢插入和更新速度,从而导致事实表加载速度太慢,尤其是当它的大小增加时。索引用于在加载表后进行查询,因此可以在插入/更新期间禁用索引,然后重建索引。外键 RELATION 很重要,而不是外键本身:这实际上隐含在 ETL 过程中。我发现外键使现实世界的数据仓库速度太慢。您需要使用虚拟外键:关系是它们的关系,但不是约束。如果您损坏了数据仓库中的外键关系,那么您就做错了。
如果您在插入期间禁用它们并且存在不匹配或孤立的情况,您将无法重新启用它们,那么有什么意义呢?
DW 的重点是快速访问和查询。外键使这成为不可能。
有趣的辩论:在网上不容易找到这个问题
凯夫
Hope this thread is still active.
My thinking is: for large fact tables with many dimensions and records, foreign keys will slow inserts and updates so that a fact table becomes too slow to load especially as it increases in size. Indexes are used for querying AFTER the table is loaded, so they can be disabled during inserts/updates and then rebuilt. The foreign key RELATION is important NOT the foreign key itself: this is really implicit in the ETL process. I have found that foreign keys make things TOO slow in the real world Datawarehouse. You need to use a VIRTUAL foreign key: the relation is their but not the constraint. If you damage the foreign key relations in a Datawarehouse you are doing something wrong.
If you disable them during inserts and there is an mismatch or orphan, you won't be able to reenable them, so what's the point.
The whole point of the DW is fast access and querying. Foreign keys make that impossible.
Interesting debate: not easy to find this question on the Net
Kev