两个关系的(De)标准化
读过CJDate的Introduction to Database System或类似水平书籍的人应该不会对规范化和非规范化的定义有任何问题。
然而,记忆已经不再是以前的样子了,我发现自己经常看着一些设计,并说它没有标准化,尽管我找不到它打破了哪些正常形式。
说明这一点的实际例子是:
如果我们有关系
r1 (A, B, C)
和 r2 (A, D)
与 FD:AB->C 和A->D
和 r1 表示详细数据,而 r2 是该数据的摘要(换句话说,D 的每个实例都是 r1 中值的函数。在此例如,让它成为根据 r1) 中的 A 的值 C 的小计。
因此
r1 =
A B C
1 1 10
1 2 20
2 1 10
2 2 25
r2 =
A D
1 30
2 35
,尽管我不能说它破坏了 2NF 或 3NF,但我似乎坚持这样的想法,即设计在以下意义上仍然是非规范化的(来自 Codd,EF“数据库关系的进一步规范化”) Model”,第 34 页,评论了 1NF 之外标准化的原因):
- 为了使关系集合免于不必要的插入, 更新和删除依赖项;
- 减少重组收集的需要 关系作为新的数据类型 引入,从而延长寿命 应用程序的跨度;
- 使关系模型为用户提供更多信息;
- 使关系集合对查询保持中立 统计数据,这些统计数据是 随着时间的推移可能会发生变化。
我可以说,如果我们将 D 定义为 r1 中所有 C 的总和,其中 r1 中的 A 等于 r2 中的 A,那么,如果我们更新 r1 中的 C 并且不更新 r2 中的 D,我们最终会得到由于存在不需要的更新依赖性,并且数据最终处于不一致的状态,我发现这个原因将 r1 和 r2 称为非规范化并将它们视为非规范化。 (事实上,整个 r2 是 r1 的函数,并将零个新事实带入模型;r2 = f(r1))
所以问题是
- 我们可以将 r1 和 r2 称为非规范化吗?
- 如果是,为什么?如果没有,为什么? (根据哪个规则?或根据哪个定义?)
注意:
对于那些发现问题足够有趣而可以给出答案的人,我恳请您提供一些可引用的内容,或者以具体假设和结论的形式提出(或者换句话说,如果您要提出答案)您的意见,请遵循一定的推理)。
编辑 我接受了 dportas 的回答。我将尝试在这里添加一些内容: CJDate可以做出明确且严格的区分:
很多设计理论都与 减少冗余;正常化 减少相关变量内的冗余, 正交性减少了它的跨度 相关变量。
引用自 深入数据库:面向从业者的关系理论
以及下一页
就像未能正常化所有 方式意味着冗余并可能导致 某些异常现象,也可以 未能遵守正交性。
People who read C.J.Date's Introduction to Database System or books of similar level should not have problems with definition of normalization and denormalization.
However, memory is not what it used to be and I find myself often looking at some design and saying that it is not normalized even though I can not find which of the normal forms it is breaking.
The actual example that illustrate it is:
If we have relations
r1 (A, B, C)
and r2 (A, D)
with FDs: AB->C and A->D
and r1
represent detailed data, while r2
is summary of that data (in another words each instance of D is a function of values in r1. in this example let it be subtotal of values C according to A from r1).
Example instance
r1 =
A B C
1 1 10
1 2 20
2 1 10
2 2 25
r2 =
A D
1 30
2 35
So, even though I can not say that it breaks for example 2NF or 3NF, I seem to be stuck on the idea that the design is still denormalised in the following sense (from Codd, E.F. "Further Normalization of the Data Base Relational Model", p. 34, commenting on the reasons to normalize beyond 1NF):
- To free the collection of relations from undesirable insertion,
update and deletion dependencies;- To reduce the need for restructuring the collection of
relations as new types of data are
introduced, and thus increase the life
span of application programs;- To make the relational model more informative to users;
- To make the collection of relations neutral to the query
statistics, where these statistics are
liable to change as time goes by.
As I can say, that if we define D as a sum of all Cs from r1 where A from r1 is equal to A from r2 then, if we update C in r1 and we don't update D in r2, we can end up with undesirable update dependency and the data ends up in inconsistent state I find this reason to call r1 and r2 denormalized and to think of them as denormalized. (In fact whole r2 is a function of r1 and bring zero new facts into the model; r2 = f(r1))
So the questions are
- can we call r1 and r2 denormalized?
- if yes, why? if not, why? (according to which rule? or according to which definition?)
NOTE:
To those who find the question(s) interesting enough to put in an answer, I kindly ask to provide either something quotable or to put it in a form of specific assumptions and conclusions (or in another words, if you are going to put in your opinion, please follow it with some reasoning).
EDIT
I accepted dportas answer. I'll try to add a bit to it here:
C.J.Date can makes a clear and strict distinction:
Much of design theory has to do with
reducing redundancy; normalization
reduces redundancy within relvars,
orthogonality reduces it across
relvars.
quoted from Database in depth: relational theory for practitioners
and on the next page
just as a failure to normalize all the
way implies redundancy and can lead to
certain anomalies, so too can a
failure to adhere to orthogonality.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
假设 AB 是 r1 中的键,A 是 r2 中的键,那么该模式似乎属于 6NF。关系数据库字典(日期)将非规范化定义为:
从根本上说,规范化/反规范化是关于使用投影和连接运算符的组合和无损分解。在此示例中,您有由不同运算符引起的冗余:求和。我预计原则上很有可能为投影和连接之外的运算符形成一种“规范化”理论,甚至可能对于求和等非关系函数也是如此。这不是规范化的传统定义方式,但是在没有任何合理基础的情况下,我认为我们应该应用上面引用中 Date 定义的技术含义非规范化。
Assuming AB is a key in r1 and A is a key in r2 then it seems that the schema is in 6NF. The Relational Database Dictionary (Date) defines denormalization as:
Fundamentally, normalization/denormalization is about composition and nonloss decomposition using projection and join operators. In this example you have redundancy caused by a different operator: summation. I expect it would be quite possible in principle to form a theory of "normalization" for operators other than projection and join, perhaps even for non-relational functions like summation. That's not how normalization is conventionally defined however and in the absence of any sound basis for doing otherwise I think we ought to apply the technical meaning denormalization as defined by Date in the above quotation.
r2 中 D 列的定义是“r1 中所有 C 的总和,其中 r1 中的 A 等于 r2 中的 A”,是对 D 的约束。更正式地说,其中 Σ 是求和,π 是投影,σ 是选择,
<代码>
(a,d) ∈ r2 ⇔ (a, d) = (a, Σ c), a ∈ πA(r1), c ∈ πC(σ<子>A=a(r1))
由于此约束既不是域约束也不是键约束,因此
r2
不在 域/密钥范式 (DKNF)。DKNF 是我所知道的唯一一种不是根据单个关系定义的范式,主要是因为它是根据约束而不是依赖关系定义的。
Your definition for column D in r2, "a sum of all Cs from r1 where A from r1 is equal to A from r2", is a constraint on D. More formally, where Σ is summation, π is projection and σ is selection,
(a,d) ∈ r2 ⇔ (a, d) = (a, Σ c), a ∈ πA(r1), c ∈ πC(σA=a(r1))
Since this constraint is neither a domain constraint nor a key constraint,
r2
is not in Domain/Key Normal Form (DKNF).DKNF is the only normal form of which I'm aware that isn't defined in terms of a single relation, chiefly because it's defined in terms of constraints rather than dependencies.
所以 r2 是 r1 的函数,这意味着 r2 是 r1 函数的物化视图
,在该示例中,它将是来自 r1 组的 select A, sum(C) 的视图by A
视图并未在 codd 关于规范化的工作中涉及,但我认为他确实写过关于它们的
具体化视图通常是出于缓存原因而完成的,有些人可能认为这是一种非规范化形式,因此有一些关于自动决定的论文哪个视图具体化,从而使其成为数据库可以对视图执行的操作,有时可以使它们更快,
但通常不允许更新视图,尽管我想我读到 codd 说过类似所有可以更新的视图都应该是和有一些论文是关于让它在一些复杂的情况下发挥作用的
so r2 is a function of r1 which mean r2 is a materialized view of that function of r1
and in that example the it would be a view of
select A, sum(C) from r1 group by A
views are not covered in codd's work on normalization, but i think he did write about them
materializing a view is normally done for caching reasons which some might think of as a form of denormalization so there were papers on automatically deciding which view to materialize thus making it just something the database could do with view to make them faster sometimes
but as updates to views are normally not permitted although i think i read that codd said something like all views that can be update-able should be and there were papers on getting that to work in some complex cases
我认为这对关系违反了第五范式。
R2 是 R1 的投影。有些人认为 SUM 超出了关系模型的范围。在本例中,SUM 是 COUNT 的简单扩展,属于关系模型的范围。
I think the pair of relations violates fifth normal form.
R2 is a projection of R1. Some argue that SUM is outside the scope of the relational model. In this case, SUM is a trivial extension of COUNT, which is within the scope of the relational model.