当前位置：文江博客话题详情

用简单的英语进行规范化

发布于 2024-08-23 14:50:27 字数 344 浏览 5 评论 0原文

我理解数据库规范化的概念，但总是很难用简单的英语解释它 - 特别是对于工作面试。我已阅读 wikipedia 帖子，但仍然发现很难向非开发人员解释这个概念。 “以不获取重复数据的方式设计数据库”是首先想到的。

有谁有一个很好的方法来用简单的英语解释数据库规范化的概念？有哪些很好的例子来展示第一、第二和第三范式之间的差异？

假设您去参加工作面试，面试官问：解释规范化的概念以及如何设计规范化数据库。

面试官在寻找哪些关键点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

美男兮 2024-08-30 14:50:27

好吧，如果我必须向我的妻子解释的话，它会是这样的：

主要思想是避免大数据的重复。

让我们看一下人员名单以及他们来自的国家/地区。我们不是为每个人保存与“波斯尼亚和黑塞哥维那”一样长的国家名称，而是简单地保存一个引用国家表的数字。因此，我们不再持有 100 个“波斯尼亚和黑塞哥维那”，而是持有 100 个#45。现在，将来，就像巴尔干国家经常发生的那样，他们分裂成两个国家：波斯尼亚和黑塞哥维那，我只需在一处进行更改。嗯，有点。

现在，为了解释 2NF，我会更改示例，假设我们保存了每个人访问过的国家/地区列表。
我不会创建这样的表：

Person   CountryVisited   AnotherInformation   D.O.B.
Faruz    USA              Blah Blah            1/1/2000
Faruz    Canada           Blah Blah            1/1/2000

我会创建三个表，一个包含国家/地区列表的表，一个包含人员列表的表，以及另一个连接它们的表。这给了我最大的自由来更改个人信息或国家/地区信息。这使我能够按照规范化的预期“删除重复行”。

Well, if I had to explain it to my wife it would have been something like that:

The main idea is to avoid duplication of large data.

Let's take a look at a list of people and the country they came from. Instead of holding the name of the country which can be as long as "Bosnia & Herzegovina" for every person, we simply hold a number that references a table of countries. So instead of holding 100 "Bosnia & Herzegovina"s, we hold 100 #45. Now in the future, as often happens with Balkan countries, they split to two countries: Bosnia and Herzegovina, I will have to change it only in one place. well, sort of.

Now, to explain 2NF, I would have changed the example, and let's assume that we hold the list of countries every person visited.
Instead of holding a table like:

Person   CountryVisited   AnotherInformation   D.O.B.
Faruz    USA              Blah Blah            1/1/2000
Faruz    Canada           Blah Blah            1/1/2000

I would have created three tables, one table with the list of countries, one table with the list of persons and another table to connect them both. That gives me the most freedom I can get changing person's information or country information. This enables me to "remove duplicate rows" as normalization expects.

回复收藏 0 原文

老娘不死你永远是小三 2024-08-30 14:50:27

一对多关系应表示为通过外键连接的两个单独的表。如果您尝试将逻辑一对多关系推入单个表中，那么您就违反了规范化，从而导致危险的问题。

假设您有一个关于您的朋友和他们的猫的数据库。由于一个人可能拥有不止一只猫，因此人与猫之间存在一对多的关系。这需要两个表：（

Friends
Id | Name | Address
-------------------------
1  | John | The Road 1
2  | Bob  | The Belltower


Cats
Id | Name   | OwnerId 
---------------------
1  | Kitty  | 1
2  | Edgar  | 2
3  | Howard | 2

Cats.OwnerId 是 Friends.Id 的外键）

上述设计已完全规范化，并符合所有已知的规范化级别。

但是假设我尝试在一个表中表示上述信息，如下所示：（

Friends and cats
Id | Name | Address       | CatName
-----------------------------------
1  | John | The Road 1    | Kitty     
2  | Bob  | The Belltower | Edgar  
3  | Bob  | The Belltower | Howard

如果我习惯使用 Excel 工作表而不是关系数据库，我可能会做出这种设计。）
如果我希望数据保持一致，单表方法迫使我重复一些信息。这种设计的问题在于，一些事实，例如鲍勃的地址是“钟楼”的信息重复了两次，这是多余的，并且使得查询和更改数据变得困难，并且（最糟糕的）可能会引入逻辑不一致。

例如。如果鲍勃搬家，我必须确保更改两行中的地址。如果鲍勃又养了一只猫，我必须确保完全按照其他两行中输入的内容重复姓名和地址。例如，如果我在其中一行中输入了鲍勃的地址，突然数据库中关于鲍勃居住地点的信息不一致。未规范化的数据库无法防止不一致和自相矛盾的数据的引入，因此数据库是不可靠的。这显然是不可接受的。

标准化不能防止您输入错误的数据。标准化可以防止数据不一致的可能性。

值得注意的是，正常化取决于业务决策。如果您有一个客户数据库，并且决定只记录每个客户的一个地址，那么表设计 (#CustomerID, CustomerName, CustomerAddress) 就可以了。但是，如果您决定允许每个客户注册多个地址，则同一表设计不会标准化，因为现在客户和地址之间存在一对多关系。因此，你不能只看数据库来确定它是否规范化，你必须了解数据库背后的业务模型。

One-to-many relationships should be represented as two separate tables connected by a foreign key. If you try to shove a logical one-to-many relationship into a single table, then you are violating normalization which leads to dangerous problems.

Say you have a database of your friends and their cats. Since a person may have more than one cat, we have a one-to-many relationship between persons and cats. This calls for two tables:

Friends
Id | Name | Address
-------------------------
1  | John | The Road 1
2  | Bob  | The Belltower


Cats
Id | Name   | OwnerId 
---------------------
1  | Kitty  | 1
2  | Edgar  | 2
3  | Howard | 2

(Cats.OwnerId is a foreign key to Friends.Id)

The above design is fully normalized and conforms to all known normalization levels.

But say I had tried to represent the above information in a single table like this:

Friends and cats
Id | Name | Address       | CatName
-----------------------------------
1  | John | The Road 1    | Kitty     
2  | Bob  | The Belltower | Edgar  
3  | Bob  | The Belltower | Howard

(This is the kind of design I might have made if I was used to Excel-sheets but not relational databases.)
A single-table approach forces me to repeat some information if I want the data to be consistent. The problem with this design is that some facts, like the information that Bob's address is "The belltower" is repeated twice, which is redundant, and makes it difficult to query and change data and (the worst) possible to introduce logical inconsistencies.

Eg. if Bob moves I have to make sure I change the address in both rows. If Bob gets another cat, I have to be sure to repeat the name and address exactly as typed in the other two rows. E.g. if I make a typo in Bob's address in one of the rows, suddenly the database has inconsistent information about where Bob lives. The un-normalized database cannot prevent the introduction of inconsistent and self-contradictory data, and hence the database is not reliable. This is clearly not acceptable.

Normalization cannot prevent you from entering wrong data. What normalization prevents is the possibility of inconsistent data.

It is important to note that normalization depends on business decisions. If you have a customer database, and you decide to only record a single address per customer, then the table design (#CustomerID, CustomerName, CustomerAddress) is fine. If however you decide that you allow each customer to register more than one address, then the same table design is not normalized, because you now have a one-to-many relationship between customer and address. Therefore you cannot just look at a database to determine if it is normalized, you have to understand the business model behind the database.

回复收藏 0 原文

你是年少的欢喜 2024-08-30 14:50:27

这就是我问受访者的问题：

为什么我们在应用程序中不使用单个表，而是使用多个表？

答案当然是常态化。正如已经说过的，它是为了避免冗余和更新异常。

回复收藏 0 原文

倦话 2024-08-30 14:50:27

这不是一个彻底的解释，但正常化的一个目标是允许增长而不尴尬。

例如，如果您有一个 user 表，并且每个用户都有一个且仅有一个电话号码，则可以在该表中包含一个 phonenumber 列。

但是，如果每个用户的电话号码数量不同，那么拥有 phonenumber1、phonenumber2 等列就会很尴尬。这有两个原因：

如果您的列最多为 phonenumber3 并且有人需要添加第四个号码，则您必须向表中添加一列。
对于电话号码少于 3 个的所有用户，其行上都有空列。

相反，您需要一个 phonenumber 表，其中每行包含一个电话号码和一个外键引用，指向它所属的 user 表中的行。不需要空白列，每个用户可以根据需要拥有尽可能多的电话号码。

回复收藏 0 原文

凉城已无爱 2024-08-30 14:50:27

关于规范化需要注意的一点是：完全规范化的数据库具有空间效率，但不一定是最时间高效的数据排列，具体取决于使用模式。

跳到多个表以从其非规范化位置查找所有信息需要时间。在高负载情况下（每秒数百万行飞来飞去，数千个并发客户端，例如信用卡交易处理），时间比存储空间更有价值，适当的非规范化表可以比完全规范化的表提供更好的响应时间。

有关这方面的更多信息，请查找 Ken Henderson 撰写的 SQL 书籍。

回复收藏 0 原文

一梦等七年七年为一梦 2024-08-30 14:50:27

我想说，标准化就像记笔记以高效地做事，可以这么说：

如果您有一张说明说您必须这样做
去买冰淇淋而不带
标准化，那么你就会有
另一张纸条，说你必须走了
买冰淇淋，只需一份
每个口袋。
现在，在现实生活中，你永远不会这样做
那么为什么要在数据库中进行呢？

对于设计和实现部分，那时您可以回到“行话”并使其远离外行术语，但我认为您可以简化。您会首先说出您需要什么，然后当标准化开始时，您会说您将确保以下内容：

表中不得有重复的信息组
任何表都不应包含不依赖于功能的数据在该表的主键上
对于 3NF，我喜欢 Bill Kent 的看法：每个非键属性都必须提供有关键、整个键的事实，并且除了键之外什么都没有。

我认为如果你也谈到非规范化，以及你不可能总是拥有最好的结构并处于规范形式这一事实，可能会更令人印象深刻。

回复收藏 0 原文

怀里藏娇 2024-08-30 14:50:27

规范化是一组用于设计通过关系连接的表的规则。

它有助于避免重复条目，减少所需的存储空间，无需重组现有表以容纳新数据，从而提高查询速度。

第一范式：数据应分解为最小的单位。表不应包含重复的列组。每一行都用一个或多个主键标识。
例如，“自定义”表中有一个名为“名称”的列，应将其分解为“名字”和“姓氏”。此外，“自定义”应该有一个名为“CustiomID”的列来标识特定的自定义。

第二范式：每个非键列应该与整个主键直接相关。
例如，如果“自定义”表有一个名为“城市”的列，则该城市应该有一个单独的表，其中定义了主键和城市名称，在“自定义”表中，将“城市”列替换为“CityID”，让“CityID”成为故事中的外键。

第三范式：每个非键列不应依赖于其他非键列。
例如，在订单表中，“总计”列依赖于“单价”和“数量”，因此应删除“总计”列。

回复收藏 0 原文

最佳男配角 2024-08-30 14:50:27

我在 Access 课程中教授标准化，并通过几种方式将其分解。

在讨论了故事板的前身或规划数据库之后，我然后深入研究标准化。我这样解释规则：

每个字段都应包含最小的有意义值：

我在黑板上写下一个姓名字段，然后像 Bill Lumbergh 一样在其中放置名字和姓氏。然后，我们向学生询问，当名字和姓氏都在一个字段中时，我们会遇到什么问题。我以我的名字为例，吉姆·理查兹。如果学生们不带我走，我就会拉着他们的手，带着他们一起走。 :) 我告诉他们，我的名字对某些人来说是一个艰难的名字，因为我有一些人认为的两个名字，有些人叫我理查德。如果您尝试搜索我的姓氏，那么对于普通人（没有通配符）来说会更困难，因为我的姓氏被埋在字段的末尾。我还告诉他们，他们在按姓氏轻松排序字段时会遇到问题，因为我的姓氏又被埋在了最后。

然后我让他们知道有意义是基于也将使用该数据库的受众。如果我们要存储人们的地址，我们在工作中将不需要单独的公寓或套房号码字段，但像 UPS 或 FEDEX 这样的运输公司可能需要将其分开，以便在他们需要去的地方轻松找到公寓或套房。他们在路上，从一个送货到另一个送货。所以对我们来说没有什么意义，但是对他们来说绝对有意义。

避免空白：

我用一个类比来向他们解释为什么他们应该避免空白。我告诉他们 Access 和大多数数据库不像 Excel 那样存储空白。 Excel 并不关心您是否在单元格中键入任何内容，也不会增加文件大小，但 Access 会保留该空间，直到您实际使用该字段的时间点。因此，即使它是空白的，它仍然会占用空间，并向他们解释这也会减慢他们的搜索速度。
我用的比喻是衣柜里的空鞋盒。如果您的衣柜里有鞋盒，并且您正在寻找一双鞋子，则需要打开并在每个鞋盒中查找一双鞋子。如果有空鞋盒，那么您不仅浪费了衣柜里的空间，而且在您需要翻阅鞋盒寻找某双鞋时也浪费了时间。

避免数据冗余：

我向他们展示一个包含大量客户信息重复值的表格，然后告诉他们我们要避免重复，因为我有香肠手指，如果我必须输入相同的内容，我会输入错误的值一遍又一遍。这种数据的“误指”将导致我的查询找不到正确的数据。相反，我们将数据分解到一个单独的表中，并使用主键和外键字段创建关系。这样我们就可以节省空间，因为我们不需要多次输入客户的姓名、地址等，而只是在客户字段中使用客户的 ID 号。然后我们将讨论下拉列表/组合框/查找列表或微软稍后想要命名的任何其他内容。 :) 作为用户，您不想每次在该客户字段中查找并输入客户号码，因此我们将设置一个下拉列表，为您提供客户列表，您可以在其中选择他们的姓名和它会为您填写客户的 ID。这将是一对多的关系，而 1 个客户将有许多不同的订单。

避免重复的字段组：

我在谈论多对多关系时演示了这一点。首先，我绘制 2 个表，1 个用于保存员工信息，1 个用于保存项目信息。桌子的摆放方式与此类似。

(Table1)
tblEmployees
* EmployeeID
First
Last
(Other Fields)….
Project1
Project2
Project3
Etc.
**********************************
(Table2)
tblProjects
* ProjectNum
ProjectName
StartDate
EndDate
…..

我向他们解释说，这不是在员工和他们从事的所有项目之间建立关系的好方法。首先，如果我们有一个新员工，那么他们不会有任何项目，所以我们会浪费所有这些字段，第二，如果一个员工在这里工作很长时间，那么他们可能已经从事了 300 个项目，所以我们会包括300个项目领域。那些新的并且只有 1 个项目的人将有 299 个浪费的项目字段。这种设计也有缺陷，因为我必须在每个项目字段中搜索才能找到参与某个项目的所有人员，因为该项目编号可能位于任何项目字段中。

我涵盖了相当多的基本概念。如果您还有其他问题或需要帮助以简单的英语进行澄清/分解，请告诉我。维基页面读起来并不简单，可能会让一些人望而生畏。

I teach normalization in my Access courses and break it down a few ways.

After discussing the precursors to storyboarding or planning out the database, I then delve into normalization. I explain the rules like this:

Each field should contain the smallest meaningful value:

I write a name field on the board and then place a first name and last name in it like Bill Lumbergh. We then query the students and ask them what we will have problems with, when the first name and last name are all in one field. I use my name as an example, which is Jim Richards. If the students do not lead me down the road, then I yank their hand and take them with me. :) I tell them that my name is a tough name for some, because I have what some people would consider 2 first names and some people call me Richard. If you were trying to search for my last name then it is going to be harder for a normal person (without wildcards), because my last name is buried at the end of the field. I also tell them that they will have problems with easily sorting the field by last name, because again my last name is buried at the end.

I then let them know that meaningful is based upon the audience who is going to be using the database as well. We, at our job will not need a separate field for apartment or suite number if we are storing people's addresses, but shipping companies like UPS or FEDEX might need it separated out to easily pull up the apartment or suite of where they need to go when they are on the road and running from delivery to delivery. So it is not meaningful to us, but it is definitely meaningful to them.

Avoiding Blanks:

I use an analogy to explain to them why they should avoid blanks. I tell them that Access and most databases do not store blanks like Excel does. Excel does not care if you have nothing typed out in the cell and will not increase the file size, but Access will reserve that space until that point in time that you will actually use the field. So even if it is blank, then it will still be using up space and explain to them that it also slows their searches down as well.
The analogy I use is empty shoe boxes in the closet. If you have shoe boxes in the closet and you are looking for a pair of shoes, you will need to open up and look in each of the boxes for a pair of shoes. If there are empty shoe boxes, then you are just wasting space in the closet and also wasting time when you need to look through them for that certain pair of shoes.

Avoiding redundancy in data:

I show them a table that has lots of repeated values for customer information and then tell them that we want to avoid duplicates, because I have sausage fingers and will mistype in values if I have to type in the same thing over and over again. This “fat-fingering” of data will lead to my queries not finding the correct data. We instead, will break the data out into a separate table and create a relationship using a primary and foreign key field. This way we are saving space because we are not typing the customer's name, address, etc multiple times and instead are just using the customer's ID number in a field for the customer. We then will discuss drop-down lists/combo boxes/lookup lists or whatever else Microsoft wants to name them later on. :) You as a user will not want to look up and type out the customer's number each time in that customer field, so we will setup a drop-down list that will give you a list of customer, where you can select their name and it will fill in the customer’s ID for you. This will be a 1-to-many relationship, whereas 1 customer will have many different orders.

Avoiding repeated groups of fields:

I demonstrate this when talking about many-to-many relationships. First, I draw 2 tables, 1 that will hold employee information and 1 that will hold project information. The tables are laid similar to this.

(Table1)
tblEmployees
* EmployeeID
First
Last
(Other Fields)….
Project1
Project2
Project3
Etc.
**********************************
(Table2)
tblProjects
* ProjectNum
ProjectName
StartDate
EndDate
…..

I explain to them that this would not be a good way of establishing a relationship between an employee and all of the projects that they work on. First, if we have a new employee, then they will not have any projects, so we will be wasting all of those fields, second if an employee has been here a long time then they might have worked on 300 projects, so we would have to include 300 project fields. Those people that are new and only have 1 project will have 299 wasted project fields. This design is also flawed because I will have to search in each of the project fields to find all of the people that have worked on a certain project, because that project number could be in any of the project fields.

I covered a fair amount of the basic concepts. Let me know if you have other questions or need help with clarfication/ breaking it down in plain English. The wiki page did not read as plain English and might be daunting for some.

回复收藏 0 原文