在数据仓库(关系)中使用外键是一种好的做法吗?
我认为这个问题已经很清楚了。我的数据仓库表中的某些列可能与主键有关系。但这是好的做法吗?它是非规范化的,因此永远不应该再次删除它(数据仓库中的数据)。希望问题足够清楚。
I think the question is clear enough. Some of the columns in my datawarehouse table could have a relationship to a primary key. But is it good practice? It is denormalized, so it should never be deleted again (data in datawarehouse). Hope question is somewhat clear enough.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
在 DW 中使用 FK 约束就像戴自行车头盔一样。如果 ETL 设计正确,技术上就不需要它们。也就是说,如果我每次看到无错误的 ETL 就能获得 100 万美元,那么我的钱就为零。
除非您遇到 FK 约束导致性能问题的情况,否则我会说放弃它们。清理参照完整性问题可能比从一开始就添加它们要困难得多;-)
Using FK-constraints in a DW is like wearing a bicycle helmet. If the ETL is designed correctly, you technically don't need them. That said, if I had a million dollars for every time I've seen bug-free ETL, I'd have zero dollars.
Until you're at a point where FK-constraints are causing performance issues, I say leave'em. Cleaning up referential integrity problems can be much harder than adding them from the get-go ;-)
我认为您在事实表中引用了 FK。在 DW 加载期间,索引和所有外键都会被删除以加快加载速度——ETL 进程负责处理键。
外键约束在插入和更新期间(此时需要检查父表中是否存在该键值)以及删除父表中的主键期间“激活”。它在读取过程中不起作用。删除 DW 中的记录是(应该)是一个受控过程,在从维度表中删除之前会扫描任何现有关系。
因此,大多数数据仓库没有将外键作为约束来实现。
I presume that you refer to FKs in fact tables. During DW loading, indexes and any foreign keys are dropped to speed up the loading -- the ETL process takes care of keys.
Foreign key constraint "activates" during inserts and updates (this is when it needs to check that the key value exists in the parent table) and during deletes of primary keys in parent tables. It does not play part during reads. Deleting records in a DW is (should) be a controlled process which scans for any existing relationships before deleting from dimension tables.
So, most DWs do not have foreign keys implemented as constraints.
FK 约束在 SQL Server 上的 Kimball 维度模型中运行良好。
通常,您的 ETL 需要查找维度表(通常在业务键上以处理缓慢变化的维度)以确定维度代理 ID,维度代理 ID 通常是一个身份,维度上的 PK 通常是维度代理 id,它已经是一个索引(可能是集群索引)。
此时使用 RI 并不是一个巨大的写入开销,因为它还可以帮助捕获开发过程中的 ETL 缺陷。此外,将事实表的 PK 设为所有 FK 的组合也有助于捕获潜在的数据建模问题和双重加载。
如果您想为星型模型制作通用的扁平化视图或表值函数,它实际上可以减少选择的开销。由于对维度的额外内部联接保证生成一行且仅生成一行,因此优化器可以非常有效地使用这些约束来消除查找表的需要。如果没有 FK 约束,可能必须进行这些查找以消除维度不存在的事实。
FK constraints work well in Kimball dimensional models on SQL Server.
Typically, your ETL will need to lookup into the dimension table (usually on the business key to handle slowly changing dimensions) to determine dimension surrogate IDs, and the dimension surrogate id is usually an identity, and the PK on the dimension is usually the dimension surrogate id, which is already an index (probably clustered).
Having RI at this point is not a huge of overhead with the writes, since it can also help catch ETL defects during development. Also, having the PK of the fact table being a combination of all the FKs can also help trap potential data modeling problems and double-loading.
It can actually reduce overhead on selects if you like to make general-use flattened views or table-valued functions of your star models. Because extra inner joins to dimensions are guaranteed to produce one and only one row, so the optimizer can use these constraints very effectively to eliminate the need to look up into the table. Without FK constraints, these lookups may have to be done to eliminate facts where the dimension does not exist.
问题很清楚,但“良好实践”似乎是错误的问题。
“可以有FK吗”?
外键是一种在数据库修改期间保留完整性约束的机制。
如果您的 DW 是只读的(累积数据源而不写回),则不需要 FK。
如果您的 DW 支持写入,则完整性约束通常需要通过 ETL(更确切地说,它相当于存储)在参与的数据源之间进行协调。此过程可能依赖也可能不依赖数据库中的 FK。
所以正确的问题是:您需要它们吗?
(我能想到的唯一另一个原因是关系的记录 - 但是,这也可以在纸质文件/单独的文件中完成。)
The quesiton is clear, but "good practice" seems the wrong question.
"Could have FK's" ?
Foreign keys are a mechanism to preserve integrity constraints during database modifications.
If your DW is read-only (accumulating data sources without writing back), there is no need for FK's.
If your DW supports writes, integrity constaints typically need to be coordinated across the participating data sources by the ETL (rather, it's Store equivalent). This process may or may not rely on FK's in the database.
So the right question would be: do you need them.
(The only other reason I can think of would be documentation of relationship - however, this can be done on paper / in a separate document, too.)
我不知道。但没有人回答,所以我用谷歌搜索并找到了最佳实践论文 他似乎说了一句非常有帮助的“看情况”:-)
I have no idea. But nobody is answering, so I googled and found a best practises paper who seem to say the very helpful "it depends" :-)
是的,作为最佳实践,请在事实表上实施 FK 约束。在 SQL Server 中,使用 NOCHECK。在 ORACLE 中始终使用 RELY DISABLE NOVALIDATE。这允许仓库或集市了解关系,但不检查 INSERT、UPDATE 或 DELETE 操作。星型转换、优化等可能不会像以前那样依赖 FK 约束来改进查询,但人们永远不知道前端或您的仓库或集市将使用什么 BI 或 OLAP 工具。其中一些工具可以利用已知的关系定义。另外,您见过多少看起来丑陋的仓库,很少或根本没有外部文档,并且不得不尝试对它们进行逆向工程?定义 FK 总是有帮助的。
作为设计师,我们似乎从来没有让我们的数据仓库或集市像我们应该的那样自我记录。定义 FK 肯定对此有所帮助。现在,话虽如此,如果在没有定义 FK 的情况下正确设计星型模式,那么无论如何都可以轻松阅读和理解它们。
对于 ORACLE 事实表,始终在维度的每个 FK 上定义一个 LOCAL BITMAP 索引。去做就对了。索引实际上比定义的 FK 更重要。
Yes, as a best practice, implement the FK constraints on your fact tables. In SQL Server, use NOCHECK. In ORACLE always use RELY DISABLE NOVALIDATE. This allows the warehouse or mart to know about the relationship, but not check it on INSERT, UPDATE, or DELETE operations. Star transformations, optimizations, etc. may not rely on the FK constraints to improve queries like they used to, but one never knows what BI or OLAP tools will be used on the front side or your warehouse or mart. Some of these tools can make use of knowing the relationships are defined. Plus, how many ugly looking warehouses have you seen with little or no external documentation and had to try to reverse engineer them? Defining the FKs always helps with that.
As designers we NEVER seem to make our data warehouses or marts as self-documenting as we should. Defining FKs certainly helps with that. Now, having said this, if star schemas are properly designed without FKs being defined, it is easy to read and understand them anyway.
And for ORACLE fact tables, always define a LOCAL BITMAP index on every FK to a dimension. Just do it. The indexing is actually more important than the FK being defined.
在数据仓库中使用外键约束的原因与任何其他数据库相同:确保数据完整性。
查询性能也可能会受益,因为外键允许某些类型的查询重写,如果没有它们,这些查询重写通常是不可能的。然而,数据完整性仍然是使用外键的主要原因。
The reason for using a foreign key constraint in a data warehouse is the same as for any other database: to ensure data integrity.
It is also possible that query performance will benefit because foreign keys permit certain types of query rewrite that are not normally possible without them. Data integrity is still the main reason to use foreign keys however.
即使在只读 DW/DM 中也有充分的理由创建 FK 约束。
是的,从只读 DW 本身的角度来看,它们并不是真正需要的,如果您的 ETL 是防弹的,等等。但是你猜怎么着 - 生命并不止于在 DW 中加载数据。大多数 BI 分析/报告工具都使用有关 DW 关系的信息来自动构建模型(例如 SSAS 表格模型)。
以我的拙见,仅此一项就超过了 ETL 过程中删除和重新创建 FK 约束的小开销。
There is a very good reason to create FK constraints in even read-only DW/DM.
Yes, they are not really required from read-only DW itself point of view, if your ETL is bullet-proof, etc., etc. But guess what - the life doesn't stop at the loading data in DW. Most of the BI analytical/reporting tools are using information about your DW relationships to automatically build their model (for example SSAS Tabular model).
In my humble opinion this alone outweighs the little overhead on dropping and recreating FK constraints during ETL process.