当前位置：文江博客话题详情

两个关系的（De）标准化

发布于 2024-10-05 21:41:49 字数 1771 浏览 2 评论 0原文

读过CJDate的Introduction to Database System或类似水平书籍的人应该不会对规范化和非规范化的定义有任何问题。

然而，记忆已经不再是以前的样子了，我发现自己经常看着一些设计，并说它没有标准化，尽管我找不到它打破了哪些正常形式。

说明这一点的实际例子是：

如果我们有关系

r1 (A, B, C) 和 r2 (A, D)

与 FD：AB->C 和A->D

和 r1 表示详细数据，而 r2 是该数据的摘要（换句话说，D 的每个实例都是 r1 中值的函数。在此例如，让它成为根据 r1) 中的 A 的值 C 的小计。

因此

，尽管我不能说它破坏了 2NF 或 3NF，但我似乎坚持这样的想法，即设计在以下意义上仍然是非规范化的（来自 Codd，EF“数据库关系的进一步规范化”） Model”，第 34 页，评论了 1NF 之外标准化的原因）：

为了使关系集合免于不必要的插入，更新和删除依赖项；
减少重组收集的需要关系作为新的数据类型引入，从而延长寿命应用程序的跨度；
使关系模型为用户提供更多信息；
使关系集合对查询保持中立统计数据，这些统计数据是随着时间的推移可能会发生变化。

我可以说，如果我们将 D 定义为 r1 中所有 C 的总和，其中 r1 中的 A 等于 r2 中的 A，那么，如果我们更新 r1 中的 C 并且不更新 r2 中的 D，我们最终会得到由于存在不需要的更新依赖性，并且数据最终处于不一致的状态，我发现这个原因将 r1 和 r2 称为非规范化并将它们视为非规范化。（事实上，整个 r2 是 r1 的函数，并将零个新事实带入模型；r2 = f(r1)）

所以问题是

我们可以将 r1 和 r2 称为非规范化吗？
如果是，为什么？如果没有，为什么？（根据哪个规则？或根据哪个定义？）

注意：
对于那些发现问题足够有趣而可以给出答案的人，我恳请您提供一些可引用的内容，或者以具体假设和结论的形式提出（或者换句话说，如果您要提出答案）您的意见，请遵循一定的推理）。

编辑我接受了 dportas 的回答。我将尝试在这里添加一些内容： CJDate可以做出明确且严格的区分：

很多设计理论都与减少冗余；正常化减少相关变量内的冗余，正交性减少了它的跨度相关变量。

引用自深入数据库：面向从业者的关系理论

以及下一页

就像未能正常化所有方式意味着冗余并可能导致某些异常现象，也可以未能遵守正交性。

原文

People who read C.J.Date's Introduction to Database System or books of similar level should not have problems with definition of normalization and denormalization.

However, memory is not what it used to be and I find myself often looking at some design and saying that it is not normalized even though I can not find which of the normal forms it is breaking.

The actual example that illustrate it is:

If we have relations

r1 (A, B, C) and r2 (A, D)

with FDs: AB->C and A->D

and r1 represent detailed data, while r2 is summary of that data (in another words each instance of D is a function of values in r1. in this example let it be subtotal of values C according to A from r1).

Example instance

So, even though I can not say that it breaks for example 2NF or 3NF, I seem to be stuck on the idea that the design is still denormalised in the following sense (from Codd, E.F. "Further Normalization of the Data Base Relational Model", p. 34, commenting on the reasons to normalize beyond 1NF):

To free the collection of relations from undesirable insertion,
update and deletion dependencies;
To reduce the need for restructuring the collection of
relations as new types of data are
introduced, and thus increase the life
span of application programs;
To make the relational model more informative to users;
To make the collection of relations neutral to the query
statistics, where these statistics are
liable to change as time goes by.

As I can say, that if we define D as a sum of all Cs from r1 where A from r1 is equal to A from r2 then, if we update C in r1 and we don't update D in r2, we can end up with undesirable update dependency and the data ends up in inconsistent state I find this reason to call r1 and r2 denormalized and to think of them as denormalized. (In fact whole r2 is a function of r1 and bring zero new facts into the model; r2 = f(r1))

So the questions are

can we call r1 and r2 denormalized?
if yes, why? if not, why? (according to which rule? or according to which definition?)

NOTE:
To those who find the question(s) interesting enough to put in an answer, I kindly ask to provide either something quotable or to put it in a form of specific assumptions and conclusions (or in another words, if you are going to put in your opinion, please follow it with some reasoning).

EDIT
I accepted dportas answer. I'll try to add a bit to it here:
C.J.Date can makes a clear and strict distinction:

Much of design theory has to do with
reducing redundancy; normalization
reduces redundancy within relvars,
orthogonality reduces it across
relvars.

quoted from Database in depth: relational theory for practitioners

and on the next page

just as a failure to normalize all the
way implies redundancy and can lead to
certain anomalies, so too can a
failure to adhere to orthogonality.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柒七 2024-10-12 21:41:49

假设 AB 是 r1 中的键，A 是 r2 中的键，那么该模式似乎属于 6NF。关系数据库字典（日期）将非规范化定义为：

替换一组相关变量 R1、R2、. 。
., Rn 通过它们连接 R，这样对于
所有 i 是 R 在
Ri 的属性保证为
等于 Ri (i = 1, 2, ..., n)。

从根本上说，规范化/反规范化是关于使用投影和连接运算符的组合和无损分解。在此示例中，您有由不同运算符引起的冗余：求和。我预计原则上很有可能为投影和连接之外的运算符形成一种“规范化”理论，甚至可能对于求和等非关系函数也是如此。这不是规范化的传统定义方式，但是在没有任何合理基础的情况下，我认为我们应该应用上面引用中 Date 定义的技术含义非规范化。