当前位置：文江博客话题详情

数据冗余

发布于 2024-11-01 04:05:41 字数 28 浏览 9 评论 0原文

引用完整性约束可以帮助解决数据冗余问题吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

巴黎夜雨 2024-11-08 04:05:41

参照完整性约束只是“一般数据库约束”的一个子集。

规范化和数据库约束是不同但又相互交织的概念。

假设您有一个表 CUSTOMERORDER (custID, custName, orderID)，其中表示“由 #custID# 标识且名为 #custName# 的客户已下了由 #orderID# 标识的订单”。

该表不太可能位于 3NF 中，因为可能适用 FD custID->custName。但假设我们仍然保留这种单表设计。那么我们必须做什么来保证数据的一致性呢？我们必须执行上述 FD。我们必须确保如果同一客户下了第二个订单，则两行中的 custName 将相同。我们必须禁止 (1, Smith, 2) 和 (1, Jones, 7) 等行同时出现在表中。这是一种需要强制执行的数据库约束，以使我们的设计符合所有规定的业务规则。

但请注意，我们在这里没有任何“参考”约束。显然，因为没有第二个表可供参考。

顺便还要注意的是，这种单表设计“自动”强制执行一些其他可能不会立即明显的约束。例如，我们的单表设计使得 orderID 不可能在没有对应的 custID 和 custName 的情况下存在。（如果你正在考虑 null，请停止这样做。在关系理论中，不存在“null”之类的东西。）“规则”是，如果注册了 orderID，则还必须存在相应的 custID PLUS custName ，由我们的设计“隐式”强制执行为单表设计，

但现在我们将设计分解为双表设计，正如传统规范化理论所规定的那样：

CUSTOMER(custID, custName) KEY custID;
ORDER(custID, orderID) KEY custID,orderID ;

我们必须执行的业务规则仍然相同，即：(a) 不能有两个具有相同 custID 但名称不同（即我们的 FD）的客户，以及 (b) 不能有任何订单没有相应的 custID该订单的 PLUS custName。

让我们看看我们的两表设计如何处理这些业务规则。 (a) 显然是通过将 custID 声明为 CUSTOMER 上的密钥来强制执行的。对于(b)，显然如果不记录custID，就不可能在ORDER 中记录orderID。但这是否足以保证所有 ORDER 行都有相应的 custName ？显然不是。这就是为什么我们需要在 ORDER 和 CUSTOMER 之间引入明显的引用完整性规则。

因此，RI 约束确实“有助于解决数据冗余问题”，从某种意义上说，通过分解表，并向整体设计引入 RI 约束，它们可以消除某些类型的冗余，同时保留数据完整性的某些保证。如果无法在设计中引入 RI 约束，我们只能以牺牲数据一致性为代价来消除冗余。

Referential integrity constraints are only a subset of "database constraints in general".

Normalization and database constraints are distinct-but-intertwined concepts.

Say you have a table CUSTOMERORDER (custID, custName, orderID), which says that "the customer identified by #custID# and who is named #custName# has placed the order identified by #orderID#".

This table is unlikely to be in 3NF because of the FD custID->custName that probably applies. But say we keep this one-table design nonetheless. What do we then have to do to enforce consistency of the data ? We have to enforce the mentioned FD. We have to see to it that if the same customer places a second order, then the custName in the two rows will be identical. We have to prohibit rows such as (1, Smith, 2) and (1, Jones, 7) to appear both in the table. That is a kind of database constraint to be enforced, in order to make our design match all the stated business rules.

But note that we do not have any "referential" constraint here. Obviously, because there is no second table to reference.

Also note in passing that this one-table design "automatically" enforces some other constraints that might not be immediately obvious. For example, our one-table design makes it impossible for an orderID to exist without a corresponding custID AND custName to also exist. (If you are thinking about nulls, stop doing so. In relational theory, there does not exist a thing such as 'null".) The "rule" that if an orderID is registered, then there must also exist a corresponding custID PLUS custName, is enforced "implicitly" by our design being a one-table one.

But now we decompose our design into a two-table one, as traditional normalization theory prescribes it :

CUSTOMER(custID, custName) KEY custID;
ORDER(custID, orderID) KEY custID,orderID ;

The business rules we have to enforce are still the same, namely : (a) there cannot be two customers with the same custID but with a different name (that's our FD), and (b) there cannot be any order without a corresponding custID PLUS custName for that order.

Let's see how our two-table design handles these business rules. (a) is obviously enforced by declaring custID as being a key on CUSTOMER. As for (b), it is obvious that it will be impossible to record an orderID in ORDER without also recording a custID. But is that sufficient to guarantee that there will also be a corresponding custName for all ORDER rows ? Obviously no. That's why we need to introduce the obvious referential integrity rule between ORDER and CUSTOMER.

Thus, RI constraints indeed "help addressing data redundancy problems", in the sense that by decomposing a table, and introducing a RI constraint to the overall design, they make it possible to eliminate certain kind of redundancies while preserving certain guarantees of data integrity. Without the possibility to introduce RI constraints in a design, we'd only be eliminating redundancy at the expense of data consistency.

回复收藏 0 原文