您如何知道 SQL 数据库何时需要更多规范化?
是否是当您尝试获取数据但没有明显的简单方法时?
当你发现某样东西应该是一个独立的桌子时?
法律是什么?
Is it when you're trying to get data and there is no apparent easy way of doing it?
When you find something should be a table on it's own?
What are the laws?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
查看维基百科。本文讨论了数据库规范化和不同的形式(第一、第二、第三等)。大多数时候,您应该至少以第三范式为目标。有时您想稍微放宽规则(将多个表连接在一起可能成本太高,因此可能需要稍微反规范化),但在大多数情况下,第三范式是好的。
Check out Wikipedia. The article talks about database normalization and the different forms (first, second, third, etc.). Most times you should be aiming for at least third normal form. There are times when you want to relax the rules a bit (it may be too expensive to join multiple tables together so might want to de-normalize a bit) but for the most part third normal form is good.
当您发现必须重复相同的数据时,或者当您开始使用单个字段作为数组时。
When you notice you have to repeat the same data, or when you start using single fields as arrays.
虽然这是一个有点尖锐的答案,当您发现数据没有充分标准化时。网络上有许多关于规范化级别(或更准确地说,“形式”)的资源,它们比我在这里更完整地描述了这些形式。第一范式和第二范式应该是非常需要的。如果你没有达到第三(或者实际上是第四)范式,你需要有一个强有力的理由来解释为什么。
查看有关数据库规范化的维基百科文章。
While this is a somewhat snarky answer, when you discover that the data isn't sufficiently normalized. There are many resources on the web about the levels (or, more properly, "forms") of normalization, and they more completely describe the forms than I could here. First and second normal forms should be pretty much required. If you aren't at third (or, really, fourth) normal form, you need to have a strong justification as to why.
Check out the Wikipedia article on database normalization.
当您开始质疑 SQL 数据库是否需要更多规范化时。
When you're starting to question whether an SQL database needs more normalization.
每当你有一个关系数据库时......
不,实际上有法律,请查看这个维基百科 链接。
它们被称为五范式或类似的东西。最初来自 20 世纪 50 年代/60 年代发明关系数据库的人 EF Codd。
“钥匙是整个钥匙,除了钥匙什么都没有,所以帮我科德”
这是一个概要:
忠实地代表一种关系并且
没有重复组
表中的非素数属性是
功能上依赖于零件
候选键的(真子集)
非素数属性是
非传递依赖于每一个
表的键 表中的每个非平凡函数依赖项都是对超键的依赖项
非平凡的多值依赖
表中是对 a 的依赖
超级键
根本不平凡的连接依赖关系
(参考广义连接
操作员)
Whenever you have a relational database....
<grin/>
No, actually there are laws, check out this Wikipedia link.
they are called the five normal forms or something like that. Originally from the guy who invented relational databases in the 50s/60s, E. F. Codd.
"The key the whole key and nothing but the Key, so help me Codd"
This is a synopsis:
faithfully represents a relation and
has no repeating groups
non-prime attribute in the table is
functionally dependent on a part
(proper subset) of a candidate key
non-prime attribute is
non-transitively dependent on every
key of the table Every non-trivial functional dependency in the table is a dependency on a superkey
non-trivial multivalued dependency
in the table is a dependency on a
superkey
non-trivial join dependencies at all
(with reference to generalized join
operator)
其他人已经向您指出了正常化的正式规则。以下是我使用的一些非正式准则:
如果表中有列,其名称仅因数字不同(例如 Phone1 和 PHone2)。
如果表格中有任何列,只有当表格中的另一列被填写时才应填写。
如果表格中有任何列,只有
如果更新数据库中的“事实”(例如街道地址)需要多次更新。
如果同一问题可能会得到两个不同的答案,具体取决于您从哪个表获取信息。
如果可以从数据库中获得任何重要问题的答案,而无需连接至少两个表。
如果
如果数据库中有任何基于数量的限制,而不是“只允许其中 1 个”(即“只允许一个地址”是可以的,但“只允许两个地址”则表明存在规范化问题) )。
Other people have pointed you to the formal rules for normalization. Here are some informal guidelines I use:
If you have columns in a table the names of which differ only by a number (eg Phone1 and PHone2).
If you have any columns in a table that should be filled in only when another column in the table is filled in.
If updating a "fact" in the database (such as a street address) requires more than one UPDATE.
If the same question could ever get two different answers depending on which table you get your information from.
If the answer to any non-trivial question can be gotten from the database without JOINing at least two tables.
If you have any quantity-based restrictions in the database other than "only 1 of something is allowed" (that is, "only one address is allowed" is okay, but "only two addresses are allowed" indicates a normalization problem).
3NF 通常是您所需要的,它遵循三个规则:
表中的每一列都应该依赖于:
出于性能原因,您通常可以“降级”到 2NF,只要您了解其中的含义,并且仅在遇到问题时,但 3NF 应该是您所有设计的初始目标。
3NF is generally all you need and it follows three rules:
Every column in the table should be dependent on:
You can often "downgrade" to 2NF for performance reasons, provided you understand the implications and only when you strike problems, but 3NF should be the initial goal for all your designs..
正如其他人所说,您知道何时开始在多个表中出现(太多)重复列。
话虽这么说,有时跨多个表拥有冗余列很有用。这可以减少复杂查询中必须执行的 JOIN 数量。请小心保持所有表同步,否则您只是自找麻烦。
As everyone else has said, you know when you start having (too many) duplicate columns in multiple tables.
That being said, it is sometimes useful to have redundant columns across multiple tables. This can reduce the number of JOINs you have to do in complicated queries. Just be careful to keep all the tables in sync, or you're just asking for trouble.
这是一篇相当不错的文章。变得正常是一门科学,而不是一门艺术。现在知道何时去规范化......这是一门艺术。
http://www.alvechurchdata.co.uk/hints-and-提示/softnorm.html
This is a pretty good article. Getting normal is a science, not an art. Now knowing when to DEnormalize... that's an art.
http://www.alvechurchdata.co.uk/hints-and-tips/softnorm.html
请参阅数据库规范化基础知识说明
See Description of the database normalization basics
您目前处于什么正常化水平?如果你不能回答这个问题,我认为你的数据库一团糟。我总是在初始设计时达到第三常态,并在需要时进一步反规范化或规范化。
What level of normalization are you currently at? If you can't answer that I assume your database is a nasty mess. I always hit 3rd normal on initial design and de-normalize or normalize further if and when needed.
我假设您正在谈论支持交互式应用程序的事务数据库,但就其价值而言...
专门用于报告且仅由 ETL 流程更新的 OLAP 数据库可能会受益于标准化程度较低的结构。在这些应用程序中,您接受冗余数据存储和复制的成本,以获得更少的连接的性能优势以及(有时技术性较低的)数据分析师和业务分析师的更高易用性。
事务数据库应始终在实用范围内规范化(至少 3NF),然后仅根据需要选择性地非规范化。理想情况下,非规范化的需要应该基于实际的性能测试结果。
I assume you're talking about a transactional database supporting an interactive application, but for what it's worth...
OLAP databases used exclusively for reporting and only updated by ETL processes may benefit from a less normalized structure. In these applications you accept the cost of redundant data storage and duplication for the performance benefit of fewer joins and the increased ease of use for (sometimes less technical) data analysts and business analysts.
Transactional databases should always be normalized to the extent practical (at least 3NF) and then selectively denormalized only as needed. And the need to denormalize should ideally be based on actual performance testing results.
当您必须搜索大量数据只是为了提取一些基本信息时 - 即有什么样的产品类别或类似的信息。
When you have to search trough huge amounts of data just to extract some basic info - i.e. what kind of Product categories are there or something like that.