什么是数据库规范化/规范化?

发布于 2024-07-09 03:24:22 字数 1729 浏览 15 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

原来是傀儡 2024-07-16 03:24:22

规范化基本上是设计数据库模式,以避免重复和冗余数据。 如果相同的信息在数据库中的多个位置重复,则存在在一处更新而另一处不更新的风险,从而导致数据损坏。

有许多标准化级别,从 1. 范式到 5. 范式。 每个范式都描述了如何解决一些特定的问题。

第一范式(1NF)很特殊,因为它与冗余无关。 1NF 不允许嵌套表,更具体地说,不允许使用表作为值的列。 SQL本来就不支持嵌套表,所以大多数普通关系数据库默认都是1NF。 所以我们可以在剩下的讨论中忽略 1NF。

范式 2NF 到 5NF 都涉及相同信息在同一个表中多次表示的情况。

例如,考虑卫星和行星的数据库:

Moon(PK) | Planet  | Planet kind
------------------------------
Phobos   | Mars    | Rock
Daimos   | Mars    | Rock
Io       | Jupiter | Gas
Europa   | Jupiter | Gas
Ganymede | Jupiter | Gas

冗余是显而易见的:木星是一颗气态行星的事实重复了三次,每个卫星一次。 这是浪费空间,但更严重的是,这种模式使信息变得不一致:

Moon(PK) | Planet  | Planet kind
------------------------------
Phobos   | Mars    | Rock
Deimos   | Mars    | Rock
Io       | Jupiter | Gas
Europa   | Jupiter | Rock <-- Oh no!
Ganymede | Jupiter | Gas

查询现在可能会给出不一致的结果,从而可能产生灾难性的后果。

(当然,数据库无法防止输入错误信息。但它可以防止不一致信息,这也是一个严重的问题。)

规范化设计将拆分数据表分成两个表:

Moon(PK) | Planet(FK)     Planet(PK) | Planet kind
---------------------     ------------------------
Phobos   | Mars           Mars       | Rock
Deimos   | Mars           Jupiter    | Gas
Io       | Jupiter 
Europa   | Jupiter 
Ganymede | Jupiter 

现在没有事实被重复多次,因此不存在数据不一致的可能性。 (由于行星名称重复,因此看起来可能仍然存在一些重复,但重复主键值作为外键并不违反规范化,因为它不会引入不一致数据的风险。)

经验法则
如果相同的信息可以用更少的单个单元格值表示(不包括外键),则应通过将表拆分为更多表来规范化表。 例如,第一个表有 12 个单独的值,而两个表只有 9 个单独的(非 FK)值。 这意味着我们消除了 3 个冗余值。

我们知道相同的信息仍然存在,因为我们可以编写一个 join 查询,它返回与原始非标准化表相同的数据。

如何避免此类问题?
通过对概念模型进行一些思考,例如通过绘制实体关系图,可以轻松避免规范化问题。 行星和卫星具有一对多的关系,这意味着它们应该用外键关联表示在两个不同的表中。 当具有一对多或多对多关系的多个实体在同一表行中表示时,就会出现规范化问题。

标准化重要吗?是的,它非常重要。 如果数据库存在规范化错误,就会面临将无效或损坏的数据存入数据库的风险。 由于数据“永远存在”,因此在损坏的数据首次进入数据库时​​很难将其删除。

但我真的认为区分 2NF 到 5NF 的不同范式并不重要。 当模式包含冗余时,这一点通常非常明显 - 只要问题得到解决,无论是违反 3NF 还是 5NF,都不那么重要。

(还有一些额外的范式,如 DKNF 和 6NF,它们仅与数据仓库等特殊用途系统相关。)

不要害怕规范化。 标准化级别的官方技术定义相当晦涩。 这听起来像是标准化是一个复杂的数学过程。 然而,规范化基本上只是常识,您会发现,如果您使用常识设计数据库模式,它通常会完全规范化。

关于规范化存在许多误解:

  • 一些人认为规范化的数据库速度较慢,而非规范化可以提高性能。 然而,这只在非常特殊的情况下才是正确的。 通常,标准化数据库也是最快的。

  • 有时,规范化被描述为一个渐进的设计过程,您必须决定“何时停止”。 但实际上标准化水平只是描述不同的具体问题。 首先,由高于 3rd NF 的范式解决的问题是非常罕见的问题,因此您的模式很可能已经处于 5NF 中。

它适用于数据库之外的任何东西吗?不直接,不。 规范化的原则对于关系数据库来说是非常具体的。 然而,一般的基本主题 - 如果不同的实例可能不同步,则不应有重复的数据 - 可以广泛应用。 这基本上就是DRY 原则

Normalization is basically to design a database schema such that duplicate and redundant data is avoided. If the same information is repeated in multiple places in the database, there is the risk that it is updated in one place but not the other, leading to data corruption.

There is a number of normalization levels from 1. normal form through 5. normal form. Each normal form describes how to get rid of some specific problem.

First normal form (1NF) is special because it is not about redundancy. 1NF disallows nested tables, more specifically columns which allows tables as values. Nested tables are not supported by SQL in the first place, so most normal relational databases will be in 1NF by default. So we can ignore 1NF for the rest of the discussions.

The normal forms 2NF to 5NF all concerns scenarios where the same information is represented multiple times in the same table.

For example consider a database of moons and planets:

Moon(PK) | Planet  | Planet kind
------------------------------
Phobos   | Mars    | Rock
Daimos   | Mars    | Rock
Io       | Jupiter | Gas
Europa   | Jupiter | Gas
Ganymede | Jupiter | Gas

The redundancy is obvious: The fact that Jupiter is a gas planet is repeated three times, one for each moon. This is a waste of space, but much more seriously this schema makes inconsistent information possible:

Moon(PK) | Planet  | Planet kind
------------------------------
Phobos   | Mars    | Rock
Deimos   | Mars    | Rock
Io       | Jupiter | Gas
Europa   | Jupiter | Rock <-- Oh no!
Ganymede | Jupiter | Gas

A query can now give inconsistent results which can have disastrous consequences.

(Of course a database cannot protect against wrong information being entered. But it can protect against inconsistent information, which is just as serious a problem.)

The normalized design would split the table into two tables:

Moon(PK) | Planet(FK)     Planet(PK) | Planet kind
---------------------     ------------------------
Phobos   | Mars           Mars       | Rock
Deimos   | Mars           Jupiter    | Gas
Io       | Jupiter 
Europa   | Jupiter 
Ganymede | Jupiter 

Now no fact is repeated multiple times, so there is no possibility of inconsistent data. (It may look like there still is some repetition since the planet names are repeated, but repeating primary key values as foreign keys does not violate normalization since it does not introduce a risk of inconsistent data.)

Rule of thumb
If the same information can be represented with fewer individual cell values, not counting foreign keys, then the table should be normalized by splitting it into more tables. For example the first table has 12 individual values, while the two tables only have 9 individual (non-FK) values. This means we eliminate 3 redundant values.

We know the same information is still there, since we can write a join query which return the same data as the original un-normalized table.

How do I avoid such problems?
Normalization problems are easily avoided by giving a bit of though to the conceptual model, for example by drawing an entity-relationship diagram. Planets and moons have a one-to-many relationship which means they should be represented in two different tables with a foreign key-association. Normalization issues happen when multiple entities with a one-to-many or many-to-many relationship are represented in the same table row.

Is normalization it important? Yes, it is very important. By having a database with normalization errors, you open the risk of getting invalid or corrupt data into the database. Since data "lives forever" it is very hard to get rid of corrupt data when first it has entered the database.

But I don't really think it is important to distinguish between the different normal forms from 2NF to 5NF. It is typically pretty obvious when a schema contains redundancies - whether it is 3NF or 5NF which is violated is less important as long as the problem is fixed.

(There are also some additional normal forms like DKNF and 6NF which are only relevant for special purpose systems like data-warehouses.)

Don't be scared of normalization. The official technical definitions of the normalization levels are quite obtuse. It makes it sound like normalization is a complicated mathematical process. However, normalization is basically just the common sense, and you will find that if you design a database schema using common sense it will typically be fully normalized.

There are a number of misconceptions around normalization:

  • some believe that normalized databases are slower, and the denormalization improves performance. This is only true in very special cases however. Typically a normalized database is also the fastest.

  • sometimes normalization is described as a gradual design process and you have to decide "when to stop". But actually the normalization levels just describe different specific problems. The problem solved by normal forms above 3rd NF are pretty rare problems in the first place, so chances are that your schema is already in 5NF.

Does it apply to anything outside of databases? Not directly, no. The principles of normalization is quite specific for relational databases. However the general underlying theme - that you shouldn't have duplicate data if the different instances can get out of sync - can be applied broadly. This is basically the DRY principle.

转身泪倾城 2024-07-16 03:24:22

最重要的是,它可以消除数据库记录中的重复项。
例如,如果您有多个地方(表)可以出现某个人的姓名,则可以将该姓名移动到一张单独的表中,并在其他地方引用它。 这样,如果您以后需要更改人名,您只需在一处更改即可。

它对于正确的数据库设计至关重要,理论上您应该尽可能多地使用它来保持数据完整性。 然而,当从许多表中检索信息时,您会损失一些性能,这就是为什么有时您会看到性能关键应用程序中使用非规范化数据库表(也称为扁平化)。

我的建议是从良好程度的规范化开始,仅在真正需要时才进行反规范化

PS 还请查看这篇文章: http://en.wikipedia.org/wiki/Database_normalization 阅读有关该主题和所谓的正常形式的更多信息

Most importantly it serves to remove duplication from the database records.
For example if you have more than one place (tables) where the name of a person could come up you move the name to a separate table and reference it everywhere else. This way if you need to change the person name later you only have to change it in one place.

It is crucial for proper database design and in theory you should use it as much as possible to keep your data integrity. However when retrieving information from many tables you're losing some performance and that's why sometimes you could see denormalised database tables (also called flattened) used in performance critical applications.

My advise is to start with good degree of normalisation and only do de-normalisation when really needed

P.S. also check this article: http://en.wikipedia.org/wiki/Database_normalization to read more on the subject and about so-called normal forms

没︽人懂的悲伤 2024-07-16 03:24:22

规范化用于消除表中列之间的冗余和功能依赖性的过程。

存在多种范式,通常用数字表示。 数字越大意味着冗余和依赖性越少。 任何 SQL 表都是 1NF(第一范式,几乎按照定义)规范化意味着以可逆的方式更改模式(通常对表进行分区),从而提供功能相同的模型,但冗余和依赖性较少。

数据的冗余和依赖性是不可取的,因为它可能导致修改数据时的不一致。

Normalization a procedure used to eliminate redundancy and functional dependencies between columns in a table.

There exist several normal forms, generally indicated by a number. A higher number means fewer redundancies and dependencies. Any SQL table is in 1NF (first normal form, pretty much by definition) Normalizing means changing the schema (often partitioning the tables) in a reversible way, giving a model which is functionally identical, except with less redundancy and dependencies.

Redundancy and dependency of data is undesirable because it can lead to inconsisencies when modifying the data.

合久必婚 2024-07-16 03:24:22

其目的是减少数据的冗余。

有关更正式的讨论,请参阅维基百科 http://en.wikipedia.org/wiki/Database_normalization

我将举一个稍微简单的例子。

假设通常包含家庭成员的组织数据库

id, name, address
214 Mr. Chris  123 Main St.
317 Mrs. Chris 123 Main St.

可以被规范化,

id name familyID
214 Mr. Chris 27
317 Mrs. Chris 27

并且家庭表

ID, address
27 123 Main St.

近乎完全规范化 (BCNF) 通常不在生产中使用,而是一个中间步骤。 将数据库放入 BCNF 后,下一步通常是以逻辑方式对其进行反规范化,以加快查询速度并降低某些常见插入的复杂性。 但是,如果不先对其进行适当的标准化,就无法做好这项工作。

这个想法是将冗余信息减少为单个条目。 这在地址等字段中特别有用,其中 Chris 先生提交的地址为 Unit-7 123 Main St.,Chris 女士列出了 Suite-7 123 Main Street,这将在原始表中显示为两个不同的地址。

通常,使用的技术是查找重复元素,并将这些字段隔离到另一个具有唯一 ID 的表中,并使用引用新表的主键替换重复元素。

It is intended to reduce redundancy of data.

For a more formal discussion, see the Wikipedia http://en.wikipedia.org/wiki/Database_normalization

I'll give a somewhat simplistic example.

Assume an organization's database that usually contains family members

id, name, address
214 Mr. Chris  123 Main St.
317 Mrs. Chris 123 Main St.

could be normalized as

id name familyID
214 Mr. Chris 27
317 Mrs. Chris 27

and a family table

ID, address
27 123 Main St.

Near-Complete normalization (BCNF) is usually not used in production, but is an intermediate step. Once you've put the database in BCNF, the next step is usually to De-normalize it in a logical way to speed up queries and reduce the complexity of certain common inserts. However, you can't do this well without properly normalizing it first.

The idea being that the redundant information is reduced to a single entry. This is particularly useful in fields like addresses, where Mr. Chris submits his address as Unit-7 123 Main St. and Mrs. Chris lists Suite-7 123 Main Street, which would show up in the original table as two distinct addresses.

Typically, the technique used is to find repeated elements, and isolate those fields into another table with unique ids and to replace the repeated elements with a primary key referencing the new table.

总攻大人 2024-07-16 03:24:22

引用CJ Date:理论即实用。

偏离规范化将导致数据库中出现某些异常情况。

偏离第一范式将导致访问异常,这意味着您必须分解和扫描各个值才能找到您要查找的内容。 例如,如果其中一个值是早期响应给出的字符串“Ford, Cadillac”,并且您正在查找“Ford”的所有出现,则必须打开该字符串并查看子串。 这在某种程度上违背了将数据存储在关系数据库中的目的。

自 1970 年以来,第一范式的定义发生了变化,但您现在不必担心这些差异。 如果您使用关系数据模型设计 SQL 表,您的表将自动采用 1NF。

偏离第二范式及以上将导致更新异常,因为相同的事实存储在多个位置。 这些问题使得不可能在不存储其他可能不存在的事实的情况下存储某些事实,因此必须发明这些事实。 或者当事实发生变化时,您可能必须找到存储事实的所有位置并更新所有这些位置,以免最终得到一个自相矛盾的数据库。 而且,当您从数据库中删除一行时,您可能会发现,如果这样做,您将删除唯一存储仍然需要的事实的位置。

这些是逻辑问题,而不是性能问题或空间问题。 有时,您可以通过仔细编程来解决这些更新异常问题。 有时(通常)最好通过遵循正常形式从一开始就防止出现问题。

尽管已经说过的很有价值,但应该提到的是,正常化是一种自下而上的方法,而不是自上而下的方法。 如果您在数据分析和初始设计中遵循一定的方法,则可以保证设计至少符合 3NF。 在许多情况下,设计将完全标准化。

您可能真正想要应用规范化下教授的概念的地方是,当您从遗留数据库或由记录组成的文件中获得遗留数据时,并且数据的设计完全不了解正常形式以及离开的后果从他们。 在这些情况下,您可能需要发现与标准化的偏差,并纠正设计。

警告:正常化常常带有宗教色彩,好像每一次偏离完全正常化都是一种罪恶,是对科德的冒犯。 (那里有一个小双关语)。 别买那个。 当您真正真正学习数据库设计时,您不仅会知道如何遵循规则,还会知道何时可以安全地打破规则。

Quoting CJ Date: Theory IS practical.

Departures from normalization will result in certain anomalies in your database.

Departures from First Normal Form will cause access anomalies, meaning that you have to decompose and scan individual values in order to find what you are looking for. For example, if one of the values is the string "Ford, Cadillac" as given by an earlier response, and you are looking for all the ocurrences of "Ford", you are going to have to break open the string and look at the substrings. This, to some extent, defeats the purpose of storing the data in a relational database.

The definition of First Normal Form has changed since 1970, but those differences need not concern you for now. If you design your SQL tables using the relational data model, your tables will automatically be in 1NF.

Departures from Second Normal Form and beyond will cause update anomalies, because the same fact is stored in more than one place. These problems make it impossible to store some facts without storing other facts that may not exist, and therefore have to be invented. Or when the facts change, you may have to locate all the plces where a fact is stored and update all those places, lest you end up with a database that contradicts itself. And, when you go to delete a row from the database, you may find that if you do, you are deleting the only place where a fact that is still needed is stored.

These are logical problems, not performance problems or space problems. Sometimes you can get around these update anomalies by careful programming. Sometimes (often) it's better to prevent the problems in the first place by adhering to normal forms.

Notwithstanding the value in what's already been said, it should be mentioned that normalization is a bottom up approach, not a top down approach. If you follow certain methodologies in your analysis of the data, and in your intial design, you can be guaranteed that the design will conform to 3NF at the very least. In many cases, the design will be fully normalized.

Where you may really want to apply the concepts taught under normalization is when you are given legacy data, out of a legacy database or out of files made up of records, and the data was designed in complete ignorance of normal forms and the consequences of departing from them. In these cases you may need to discover the departures from normalization, and correct the design.

Warning: normalization is often taught with religious overtones, as if every departure from full normalization is a sin, an offense against Codd. (little pun there). Don't buy that. When you really, really learn database design, you'll not only know how to follow the rules, but also know when it's safe to break them.

春庭雪 2024-07-16 03:24:22

规范化是基本概念之一。 这意味着两件事不会互相影响。

在数据库中,具体意味着两个(或更多)表不包含相同的数据,即没有任何冗余。

乍一看,这确实很好,因为您出现一些同步问题的机会接近于零,您总是知道您的数据在哪里,等等。但是,您的表数量可能会增加,并且您将在交叉数据时遇到问题并得到一些总结结果。

因此,最后您将完成非纯粹规范化的数据库设计,并具有一些冗余(它将处于某些可能的规范化级别)。

Normalization is one of the basic concepts. It means that two things do not influence on each other.

In databases specifically means that two (or more) tables do not contain the same data, i.e. do not have any redundancy.

On the first sight that is really good because your chances to make some synchronization problems are close to zero, you always knows where your data is, etc. But, probably, your number of tables will grow and you will have problems to cross the data and to get some summary results.

So, at the end you will finish with database design that is not pure normalized, with some redundancy (it will be in some of the possible levels of normalization).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文