数据库规范化设计——单表或多表
这应该在数据库中表示为 1 个表还是 3 个表?我和我的朋友对此有不同的看法,所以我想看看对此的普遍看法。 (也许应该投票支持任一解决方案?)
Create Table Order
// Basic fields of the table
- ID (Primary key)
- CustomerID (integer, with a FK)
- Quantity
- ProductID (integer, with a FK)
// Then depending on user selection, either these fields need to be specified
// (could be factored out to a separate table):
{
- InternalAccountID (integer, with a FK)
- InternalCompanyID (integer, with a FK)
}
// Or these (could be factored out to a separate table):
{
- ExternalAccountNumber (free text string)
- ExternalCompanyName (free text string)
- ExtraInformation (free text string)
}
1 表方法:
优点:
- 性能(一次插入而不是两次插入、FK 检查、无连接)
- 可能占用更少的空间(额外的表有开销 + 索引 + 额外的 ID 字段)
- 一个表而不是三个表
- 几乎没有理由将 2+3 个字段拆分为新表(或者什么?)
缺点:
- 可为空的字段
- 可能有额外的“类型”列(可以跳过)
- 打破 3NF(?)
优点和缺点请求以及个人意见。 :)
编辑: 我尝试通过使用与实际使用不同的实体来简化示例,因此任何更改模型的建议都不会真正帮助我。即,请更多地关注技术方面而不是领域模型。
Should this be represented in the database as 1 table or 3 tables? I and my friend have different opinions about this so I'd like to see the general views on this. (Maybe it should be a vote for either solution?)
Create Table Order
// Basic fields of the table
- ID (Primary key)
- CustomerID (integer, with a FK)
- Quantity
- ProductID (integer, with a FK)
// Then depending on user selection, either these fields need to be specified
// (could be factored out to a separate table):
{
- InternalAccountID (integer, with a FK)
- InternalCompanyID (integer, with a FK)
}
// Or these (could be factored out to a separate table):
{
- ExternalAccountNumber (free text string)
- ExternalCompanyName (free text string)
- ExtraInformation (free text string)
}
1 table approach:
Pros:
- performance (one insert as opposed to two, FK check, no joins)
- probably takes less space (the extra tables have overhead + indexes + extra ID field)
- one table as opposed to three
- hardly justifiable to have split out to new tables just for 2+3 fields (or what?)
Cons:
- Nullable fields
- Potentially extra "type" column (can be skipped)
- Breaks 3NF (?)
Pros and cons kindly requested as well as personal opinions. :)
EDIT: I tried simplifying the example by using different entities than I am actually using so any suggestions to altering the model wouldn't really help me. I.e. focus on the technical aspects more than the domain model please.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
希望这是不言自明的。
Hope this is self-explanatory.
我的观点是,如果
订单总是 1:1(即,您不能有 3 个 accountID),那么将其保留为一张表。要解决 null 问题,您可以再添加一个名为 InternalCustomer (布尔值)或 CustomerType (varChar) 的列,您可以使用它们来定义内部或外部客户,以了解您应该查看两组字段中的哪一组来查找特定客户。
由于我们不知道这些数据的完整使用或整个数据库的模式,因此对此的任何响应都不能真正完全合格。
My opinion would be that if
are always 1:1 with an order (i.e., you can't have 3 accountIDs), then leave it as one table. To take care of your null issue, you could add one more column called InternalCustomer (boolean) or CustomerType (varChar) that you could use to define an internal or external customer to know which of the two sets of fields you should look at for a specific customer.
Since we don't know the full use of this data or the schema for the entire DB, any response on this can't really be fully qualified.
如果您想避免数据重复,您应该采用 2 表或 3 表解决方案。例如,如果订单表中有
External
列,则值可能会存在多次。如果数据如下所示:现在,如果 ACME 将名称更改为 ACME, Inc.,您必须更新许多行。如果表已标准化,其中外部公司位于单独的表中,则您将更新一行。请注意,可能有一个参数将帐号放入其自己的表中,但我们将保留它以进行极端标准化。
订单和公司/账户之间看起来并不是一对一的关系,除非每个公司/账户只能有一个订单。听起来更像是一对多的关系。
现在,如果在单表环境中更新ExternalCompanyName 时发生错误,并且仅更新部分行,会发生什么情况。您有一些行属于 ACME,有些行属于 ACME, Inc。您最终会遇到不良数据情况。
另外,如果这确实是一对多关系,那么您实际上并没有节省空间。您按顺序复制数据,而不是将其存储在另一个表中。
If you want to avoid data duplication, you should go with a 2- or 3-table solution. For example, if you have the
External
columns in the Order table, value could exist multiple times. If the data looks like this:Now, if ACME changes names to ACME, Inc. you must update many rows. If the tables are normalized, where external companies are in a separate table, you would update one row. Note, there may be an argument to put Account Number in it's own table, but we'll leave that for extreme normalization.
It doesn't appear to be a 1-to-1 relationship between an order and a company/account, unless each company/account can only have one order. it sounds more like a 1-to-many relationship.
Now, what happens if a mistake is made when updating the ExternalCompanyName in a single-table environment, and only some of the rows get updated. You have some rows with ACME and some rows with ACME, Inc. You end up with a bad-data situation.
Also, if this is really a 1-to-many relationship, you really aren't saving space. You are duplicating data in the order, rather than storing it once in another table.
随着数量的增加,从两个表中进行选择可能会比从一个表中进行选择快得多。有时,这种重构(分区)是在成熟的数据库上进行的,以提高性能。
想象一下将其用于多表联接,其中某些条件位于该表上,但其他条件位于不同的表中。
它最终可能会从磁盘中获取日期的所有
order
行,然后丢弃其中的许多行,因为它们与连接不匹配。从磁盘获取数据肯定会很慢,并且可能会破坏 RAM 缓存。在这种情况下,当它从磁盘加载所有
order
行时,它不会像以前那样受到伤害,因为表更窄更小。它不需要加载所有与过滤无关的冗长字段。最终,在加入customer
后,它只会获取那些匹配所有条件的order_detail
行。如果您预计该值会很大,则应考虑拆分表,以便对搜索最关键的字段位于一个表中,而“数据”字段位于其他一对一表中。
底线是:正常形式和域是一回事,但性能通常需要权衡。您可以隐藏其中一些(用视图覆盖拆分),但不是全部(为了更快的选择而重复/聚合字段)。
As the volume increases selection from two tables may be a lot faster than one. Sometimes this kind of refactoring (partition) is done on mature databases to increase performance.
Imagine using this for a multi table join, where some criteria are on this table, but others are in different tables.
It may end up fetching all
order
rows for the dates from disk, then throwing many of them away because they don't match the join. This fetch from disk is bound to be slow and may to spoil your RAM cache.In this case when it loads all
order
rows from disk it's not going to hurt as bad as previously, because the table is narrower and smaller. It doesn't need to load all the lengthy fields which are irrelevant for filtering. Eventually, after join tocustomer
, it will only fetch thoseorder_detail
rows which match all criteria.If you expect this to be large, you should consider splitting the table so that the fields which are most critical for searches are in one table, and "data" fields in other one-to-one table(s).
The bottom line is: Normal form and domain is one thing, but performance often requires tradeoffs. You can hide some of them (cover the split with a view), but not all (duplicate/aggregate fields for the sake of faster selects).
我绝对不会采用3桌解决方案。通过将此数据分成 3 个表,您实际上无法让任何查询在不连接外键的情况下返回完整的订单标头,并且新订单的每次插入都会更新多个表和索引,这就是并发问题。我建议使用 2 个表,一张用于内部订单,一张用于外部订单。对于需要对两组订单中的数据进行合并查询的情况,请定义一个视图,该视图是两个表的并集。
我惊讶地看到产品 ID 和数量作为订单标题的一部分。我见过的每个订单跟踪数据库都将订单项目分解为一个单独的表,使用订单 ID 作为外键,以便单个订单可以包含多个产品(或具有不同数量、交货时间等的同一产品)。 )。
I would absolutely not go with the 3-table solution. By breaking this data into 3 tables, you really can't have any queries return a full order header without joining with the foreign key, and every insert of a new order updates multiple tables and indexes, which is a problem for concurrency. I would suggest using 2 tables, one for InternalOrders and one for ExternalOrders. For those cases where you need a consolidated query of data from both sets of orders, define a view that is the union of both tables.
I am surprised to see product id and quantity as part of the order header. Every order tracking database I've ever seen breaks out order items as a separate table, using the order id as a foreign key, so that a single order can include multiple products (or the same product with various quantities, delivery times, etc.).
我不是纯粹主义者,所以 3nf 在有意义的时候是好的……但你不必想当然地认为它总是如此。
从务实的角度来看,您的目标是什么?你的优点/缺点列表是一个好的开始。我会在列表中添加更多的想法——只要你认为合适。
1) 数据库中的任何其他表是否需要与该数据关联(例如,联接)?这就是 RDB 的意义所在。
2)您的数据库会增长吗?即使 1 个表现在有意义,它总是有意义吗?如果您发现自己想要添加更多表,并且非规范化表迫使您“解决”它,处理返回的额外行、较慢的执行时间等,您会后悔的。
3)当发生什么时您的客户获得一个新的外部帐户,或者您有什么。你会创造一个全新的记录吗?您将如何回答诸如“客户某某的帐号是什么?”之类的问题。
...
我认为一般来说,我追求可扩展性,在这种情况下可能意味着 3nf。 1 个表在非常狭窄的范围内更容易处理,但如果有任何变化,您将处理“如何将此表拆分为正确相关的 3nf 表,而不弄乱已创建的所有依赖项它?”。那个一点也不好玩。
I'm no purist, so 3nf is good when it makes sense...but you don't have to take it for granted that it always will.
From a pragmatic standpoint, what is your goal? Your pros/cons list is a good start. I'd add a few more ideas to the list -- as you see fit.
1) Will any other table in your database need to relate (e.g., join) to this data? That's kind of the point of an RDB.
2) Will your database grow? Even if 1 table makes sense now, will it always make sense? You'll regret it, if you find yourself wanting to add more tables, and your non-normalized table is forcing you to 'work-around' it, dealing with extra rows returned, slower execution times, etc.
3) What happens when your customer gets a new external account, or what have you. Will you create a brand-new record? How will you answer questions like "What is customer so-and-so's account number?".
...
I think in general, i go for scalable, which in this case may mean 3nf. 1 table is easier to deal with in a very narrow scope, but if anything ever changes, you'll be dealing with "How do I split this table into properly related 3nf tables, without messing up all of the dependencies that have been created on it?". That one is no fun.
在客户订购之前,帐户信息是否与客户相关联(即,您有另一个表,您可以在其中跟踪给定的 CustomerID 可以使用哪个帐户 ID)?您能否将所有帐户抽象为一个相当统一的模式(可以有一些空值),以便您有一个通用 AccountId(代理键),然后帐户表有 3 个 varchar 字段和一个跟踪帐户类型的字段(使用用于计费等)?
如果您可以做到这一点,那么您的订单只会跟踪一个 AccountID,因为订单(作为一个实体)实际上并不关心使用了哪种付款方式 - 它只关心它是否是该用户的合法/现有/批准的 AccountId。可以说,其他一切都是别人的事(计费或检查资金等),并且该实体及其处理无论如何都需要更多数据。
这可以使您的订单保持干净且无空值,并且也有利于关注点分离。
从概念上讲,您的订单实际上是所谓的事实表 - 仅包含数字和 FK-s,项目大小很小,但数量巨大。
所以:
Is account information associated with the customer before he can order (i.e. you have enother table where you track which account ID-s given CustomerID can use)? Can you abstract all accounts to a reasonably uniform schema (that one can have a few nulls) fo that you have one universal AccountId (surrogate key) and then Account's table has say 3 varchar fields and one that tracks the kind of the account (used for billing etc.) ?
If you can do that then your order tracks just one AccountID since the order (as an entity) really doesn't care which payment method was used - it only cares that it's a legit/existing/approved AccountId for that user. Everything else is someone else's business so to speak (billing or checking funds etc.) and that enity and it's processing will need more data anyway.
This keeps your Order clean and null-free and facilitates separation of concerns as well.
Conceptually, your Order is really so called fact table - carrying only numbers and FK-s, small in item size but with a huge number of them.
So: