当前位置：文江博客话题详情

如何使用 php 在 mysql 上应用规范化

发布于 2024-08-19 03:55:55 字数 202 浏览 4 评论 0原文

拜托我没有任何想法。尽管我已经阅读了一些有关该主题的内容。我只知道它是用来让数据库中的数据更高效、更容易处理的。并且它还可以用于节省磁盘空间。最后，如果您使用标准化。您将必须生成更多表。

现在我有很多问题要问。首先，规范化如何帮助节省磁盘空间或数据库占用的任何空间。其次，是否可以仅使用 1 个查询在多个表上添加数据。请大家帮忙，我只是一个新手，想向您学习。谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲欢浪云 2024-08-26 03:55:55

好吧，有几件事：

php 与此无关。规范化是关于建模数据
规范化不是为了节省磁盘空间。它是关于组织数据以使其易于维护，这反过来又是维护数据完整性的一种方法。
标准化通常分为几个阶段或“正常形式”。在实践中，设计关系数据库的人大多数时候通常凭直觉“做对了”。但了解范式及其特征仍然是有好处的。互联网上有很多相关文档（fe http://en.wikipedia.org/wiki /Database_normalization），你当然应该自己研究，但最重要的阶段是：

非规范化数据：在这个阶段，数据不是真正的表格（“关系”）。关于表格的真正含义有很多讨论，专家们也意见不一。但大多数人都同意，如果存在多值属性（=可以为一行包含列表作为值的列），或者存在重复组（=用于存储相同的多列或多组列），则数据是非标准化的数据类型）

多值列示例：person（名字、姓氏、电话号码）
这里，电话号码意味着可以有更多电话号码，存储在一列中

重复组的示例：person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
这里，person 表有许多列对（child_first_name、child_birth_date）来存储该人的孩子。

请注意，像 order (shipping_address, billing_address) 这样的内容不是重复组：账单地址和送货地址可能是类似的数据，但每个地址对于订单都有自己不同的角色，两者都只代表订单的不同方面。 child1 到 child10 没有 - 子级没有特定的角色，并且子级列表是可变的（您永远不知道应该提前保留多少组）

在这两种情况下，多值列和重复组，您基本上都有“嵌套”表”结构——表中的表。如果这两种情况都不发生，则称数据处于 1NF（第一范式）。

1NF 是关于结构特征：数据的表格形式。所有后续的范式都与消除冗余有关。当相同的信息被独立存储多次时，就会出现冗余。冗余是不好的：如果你想改变一些事实，你必须在多个地方改变它。如果你忘记了其中之一，你就会得到不一致的数据——数据本身是矛盾的。

有很多过程可以消除冗余，每个过程都会导致更高的范式，从 1nf 一直到 6nf。然而，通常大多数数据库都在 3nf（或称为 boyce-codd 范式，BCNF 的 lsight 变体）下充分标准化。您应该研究 2nf 和 3nf，但原则非常简单：表已充分标准化，如果：

该表在 1nf 中，
表有一个键（需要其值的列或列组合，并且唯一标识一行 - 即只能有一行在键列中具有该值组合），
之间没有功能依赖关系非键列
非键列在功能上不依赖于键的一部分（但在功能上完全依赖于整个键）。

函数依赖意味着一列的值可以从另一列派生。简单的例子：

order_item (order_id, item_number, customer_id, Product_code, Product_description, amount)

让我们假设 (order_id, item_number) 是关键。产品代码和产品描述在功能上相互依赖：对于一个特定的产品代码，您将始终找到相同的产品描述（就好像产品描述是产品代码的函数一样）。现在的问题是：假设特定产品代码的产品描述发生变化，您必须更改使用该产品代码的所有订单。如果只忘记一个，那么数据库就会不一致。

解决这个问题的方法是用（product_code，product_description）创建一个新的产品表，以（product_code）为键，然后不按顺序存储所有产品字段，而是只存储对产品表中一行的引用order_item 记录（在这种情况下，order_item 应该只保留product_code，这足以在产品表中查找一行并找到product_description）

因此，正如您所看到的，使用此解决方案实际上可以节省空间（通过不存储所有每个 order_item 中恰好订购了产品的这些产品描述）并且您确实获得了更多表（从 order_item 中分离出产品），但请记住，这不是因为节省磁盘空间：而是因为您消除了冗余，从而更容易维护数据。因为现在您只需更改产品表中的一行即可更改描述

Ok, couple of things:

php has got nothing to do with this. normalization is about modelling data
normalization is not about saving disk space. It is about organizing data so that it is easily maintainable, which in turn is a way to maintain data-integrity.
normalization is typically described in a few stages or 'normal forms'. In practice, people that design relational databases often intuitively 'get it right' most of the time. But it is still good to be aware of the normal forms and what their characteristics are. There is a lot of documentation on that on the internet (fe http://en.wikipedia.org/wiki/Database_normalization), and you should certainly do you own research, but the most important stages are:

unormalized data: in this stage, data is not truly tabular ('relational'). There is a lot of discussion of what tabular really means, and experts disagree with one another. but most people agree that data is unnormalized in case there are multi-valued attributes (=columns that can for one row contain lists as value), or in case there are repeating groups (=multiple columns or multiple groups of columns for storing the same type of data)

Example of multi-valued column: person (first_name, last_name, phonenumbers)
Here, phonenumbers implies there could be more phonenumbers, stored in one column

Example of repeating group: person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
Here, the person table has a number of column pairs (child_first_name, child_birth_date) to store the person's children.

Note that something like order (shipping_address, billing_address) is not a repeating group: the addresses for billing and shipping may be similar pieces of data, but each has its own distinct role for an order, both just represent a different aspect of an order. child1 thru child10 do not - children do not have specific roles, and the list of children is variable (you never know how many groups you should reserve in advance)

In both cases, multi-valued columns and repeating groups, you basically have "nested table" structure - a table within a table. Data is said to be in 1NF (first normal form) if neither of these occur.

The 1NF is about structural characeristics: the tabular form of the data. All subsequenct normal forms have to do with eliminating redundancy. Redundancy occurs when the same information is independently stored multiple times. Redundancy is bad: if you want to change some fact, you have to change it in multiple places. If you forget to chance one of them, you have inconsistent data - the data is contradicting itself.

There are a lot of processes that can eliminate redundancy, each leading to a higher normal form, all the way from 1nf up to 6nf. However, typically most databases are adequately normalized at 3nf (or a lsight variation of that called boyce-codd normal form, BCNF) You should study 2nf and 3nf, but the principle is very simple: a table is adequately normalized, if:

the table is in 1nf
the table has a key (a column or column combination whose values are required, and which uniquely identifies a row - ie. there can be only one row having that combination of values in the key columns)
there are no functional dependencies between the non-key columns
non-key columns are not functionally dependent upon part of the key (but are completely functionally dependent upon the entire key).

functional dependency means that a column's value can be derived from another column. simple example:

order_item (order_id, item_number, customer_id, product_code, product_description, amount)

let's assume (order_id, item_number) is key. product_code and product description are functionally dependent upon each other: for one particular product_code, you will always find the same product description (as if product description is a function of product_code). The problem is now: suppose a product description changes for a particualr product code, you have to change all orders that us that product_code. forget only one and you have an inconsistent database.

The way to solve it is to create a new product table with (product_code, product_description), having (product_code) as key, and then instead of storing all product fields in order, only store a reference to a row in the product table in the order_item records (in this case, order_item should only keep product_code, which is sufficient to look up a row in the product table and find the product_description)

So as you u can see, with this solution you do actually save space (by not storing all these product descriptions in each order_item that happens to order the product) and you do get more tables (split off product from order_item) But just remember that it is not because of saving diskspace: it is because you eliminate redundancy, thus making it easier to maintain the data. because now you only have to change one row in the product table to change the description

回复收藏 0 原文

小情绪 2024-08-26 03:55:55

StackOverflow 上已经有很多类似的问题，例如有人可以用简单的英语给出 1NF、2NF 和 3NF 的示例吗？

在右侧的相关侧边栏中查找其中的一堆。这会让你开始。

至于您的具体问题：

标准化通过减少冗余数据存储来节省磁盘空间。这还有另一个好处：如果您的数据库中有给定实体属性的多个副本，它们可能会不同步，而如果您有标准化数据库并使用引用完整性，则不会发生这种情况。
INSERT 语句仅引用一个表。插入上的 TRIGGER语句可以向其他表添加行，但是除了生成它的表中的那些列之外，无法向触发器提供数据。
当您在向父表插入行后需要插入相关行时，请使用LAST_INSERT_ID() 函数，用于检索最后一个 INSERT 语句的自动生成的主键值你的会议。

回复收藏 0 原文

薄荷梦 2024-08-26 03:55:55

我想当您开始为数据库创建架构时您就会学到这一点。
当您添加数据库中其他位置存在的字段时，请反向思考。

我的意思是，问自己：如果我必须修改该字段，我必须运行多少个查询？
也许您最终得到的答案是，您必须运行 2 或 X 次查询才能修改列的内容。
保持简单，这意味着为数据库中复制的每个内容分配一个 ID。

例如，采用 address 列，

这不是一个好的

update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;

方法，

create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;

并且在数据库表中的任何位置使用地址 id 100 作为外键引用（客户+订单），这样您就可以实现 id 100未更改，但如果您更新地址的内容，所有链接的表都将拾取更改。

这次正常化级别 3 对您来说已经足够了。

I think you will learn this when you start creating the schema for your database.
Please think reverse when you add a field that exists somewhere else in your database.

By reverse I mean, ask yourself: if I have to modify the field, how many queries do I have to run?
Probably you end up, with the answer, that you will have to run 2 or X times the query to modify the content of your column.
Keep it simple, that means assign an ID to each content you have duplicated in your database.

For example taking column address

this is not good

update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;

good approach would be

create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;

And use the address id 100 everywhere in your database table as a foreign key reference (clients+orders), this way you achieve that the id 100 is not changed, but if you update the content of the address all linked tables will pick up the change.

Level 3 of normalization is enough this time for you.

回复收藏 0 原文