数据库规范化设计——单表或多表

发布于 2024-09-09 05:56:45 字数 1053 浏览 21 评论 0原文

这应该在数据库中表示为 1 个表还是 3 个表？我和我的朋友对此有不同的看法，所以我想看看对此的普遍看法。（也许应该投票支持任一解决方案？）

Create Table Order
// Basic fields of the table
 - ID (Primary key)
 - CustomerID  (integer, with a FK)
 - Quantity
 - ProductID  (integer, with a FK)

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

1 表方法：

优点：

性能（一次插入而不是两次插入、FK 检查、无连接）
可能占用更少的空间（额外的表有开销 + 索引 + 额外的 ID 字段）
一个表而不是三个表
几乎没有理由将 2+3 个字段拆分为新表（或者什么？）

缺点：

可为空的字段
可能有额外的“类型”列（可以跳过）
打破 3NF（？）

优点和缺点请求以及个人意见。 :)

编辑： 我尝试通过使用与实际使用不同的实体来简化示例，因此任何更改模型的建议都不会真正帮助我。即，请更多地关注技术方面而不是领域模型。

原文

Should this be represented in the database as 1 table or 3 tables? I and my friend have different opinions about this so I'd like to see the general views on this. (Maybe it should be a vote for either solution?)

Create Table Order
// Basic fields of the table
 - ID (Primary key)
 - CustomerID  (integer, with a FK)
 - Quantity
 - ProductID  (integer, with a FK)

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

1 table approach:

Pros:

performance (one insert as opposed to two, FK check, no joins)
probably takes less space (the extra tables have overhead + indexes + extra ID field)
one table as opposed to three
hardly justifiable to have split out to new tables just for 2+3 fields (or what?)

Cons:

Nullable fields
Potentially extra "type" column (can be skipped)
Breaks 3NF (?)

Pros and cons kindly requested as well as personal opinions. :)

EDIT: I tried simplifying the example by using different entities than I am actually using so any suggestions to altering the model wouldn't really help me. I.e. focus on the technical aspects more than the domain model please.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一身软味 2024-09-16 05:56:45

希望这是不言自明的。

回复收藏 0 原文

凶凌 2024-09-16 05:56:45

我的观点是，如果

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

订单总是 1:1（即，您不能有 3 个 accountID），那么将其保留为一张表。要解决 null 问题，您可以再添加一个名为 InternalCustomer （布尔值）或 CustomerType (varChar) 的列，您可以使用它们来定义内部或外部客户，以了解您应该查看两组字段中的哪一组来查找特定客户。

由于我们不知道这些数据的完整使用或整个数据库的模式，因此对此的任何响应都不能真正完全合格。

My opinion would be that if

 // Then depending on user selection, either these fields need to be specified 
 // (could be factored out to a separate table):
 {
 - InternalAccountID (integer, with a FK)
 - InternalCompanyID (integer, with a FK)
 }

 // Or these (could be factored out to a separate table):
 {
 - ExternalAccountNumber (free text string)
 - ExternalCompanyName (free text string)
 - ExtraInformation (free text string)
 }

are always 1:1 with an order (i.e., you can't have 3 accountIDs), then leave it as one table. To take care of your null issue, you could add one more column called InternalCustomer (boolean) or CustomerType (varChar) that you could use to define an internal or external customer to know which of the two sets of fields you should look at for a specific customer.

Since we don't know the full use of this data or the schema for the entire DB, any response on this can't really be fully qualified.

回复收藏 0 原文

违心° 2024-09-16 05:56:45

如果您想避免数据重复，您应该采用 2 表或 3 表解决方案。例如，如果订单表中有 External 列，则值可能会存在多次。如果数据如下所示：

ID   ExternalCompanyName
1    ACME
2    ACME
3    My Company
4    ACME

现在，如果 ACME 将名称更改为 ACME, Inc.，您必须更新许多行。如果表已标准化，其中外部公司位于单独的表中，则您将更新一行。请注意，可能有一个参数将帐号放入其自己的表中，但我们将保留它以进行极端标准化。

订单和公司/账户之间看起来并不是一对一的关系，除非每个公司/账户只能有一个订单。听起来更像是一对多的关系。

现在，如果在单表环境中更新ExternalCompanyName 时发生错误，并且仅更新部分行，会发生什么情况。您有一些行属于 ACME，有些行属于 ACME, Inc。您最终会遇到不良数据情况。

另外，如果这确实是一对多关系，那么您实际上并没有节省空间。您按顺序复制数据，而不是将其存储在另一个表中。

If you want to avoid data duplication, you should go with a 2- or 3-table solution. For example, if you have the External columns in the Order table, value could exist multiple times. If the data looks like this:

ID   ExternalCompanyName
1    ACME
2    ACME
3    My Company
4    ACME

Now, if ACME changes names to ACME, Inc. you must update many rows. If the tables are normalized, where external companies are in a separate table, you would update one row. Note, there may be an argument to put Account Number in it's own table, but we'll leave that for extreme normalization.

It doesn't appear to be a 1-to-1 relationship between an order and a company/account, unless each company/account can only have one order. it sounds more like a 1-to-many relationship.

Now, what happens if a mistake is made when updating the ExternalCompanyName in a single-table environment, and only some of the rows get updated. You have some rows with ACME and some rows with ACME, Inc. You end up with a bad-data situation.

Also, if this is really a 1-to-many relationship, you really aren't saving space. You are duplicating data in the order, rather than storing it once in another table.

回复收藏 0 原文

花想c 2024-09-16 05:56:45

随着数量的增加，从两个表中进行选择可能会比从一个表中进行选择快得多。有时，这种重构（分区）是在成熟的数据库上进行的，以提高性能。

想象一下将其用于多表联接，其中某些条件位于该表上，但其他条件位于不同的表中。

select from order join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

它最终可能会从磁盘中获取日期的所有 order 行，然后丢弃其中的许多行，因为它们与连接不匹配。从磁盘获取数据肯定会很慢，并且可能会破坏 RAM 缓存。

select from order join order_detail using (order_id) join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

在这种情况下，当它从磁盘加载所有 order 行时，它不会像以前那样受到伤害，因为表更窄更小。它不需要加载所有与过滤无关的冗长字段。最终，在加入 customer 后，它只会获取那些匹配所有条件的 order_detail 行。

如果您预计该值会很大，则应考虑拆分表，以便对搜索最关键的字段位于一个表中，而“数据”字段位于其他一对一表中。

底线是：正常形式和域是一回事，但性能通常需要权衡。您可以隐藏其中一些（用视图覆盖拆分），但不是全部（为了更快的选择而重复/聚合字段）。

As the volume increases selection from two tables may be a lot faster than one. Sometimes this kind of refactoring (partition) is done on mature databases to increase performance.

Imagine using this for a multi table join, where some criteria are on this table, but others are in different tables.

select from order join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

It may end up fetching all order rows for the dates from disk, then throwing many of them away because they don't match the join. This fetch from disk is bound to be slow and may to spoil your RAM cache.

select from order join order_detail using (order_id) join customer using (customer_id)
where
    order.order_date between ? and ?
    and customer.name = ?

In this case when it loads all order rows from disk it's not going to hurt as bad as previously, because the table is narrower and smaller. It doesn't need to load all the lengthy fields which are irrelevant for filtering. Eventually, after join to customer, it will only fetch those order_detail rows which match all criteria.

If you expect this to be large, you should consider splitting the table so that the fields which are most critical for searches are in one table, and "data" fields in other one-to-one table(s).

The bottom line is: Normal form and domain is one thing, but performance often requires tradeoffs. You can hide some of them (cover the split with a view), but not all (duplicate/aggregate fields for the sake of faster selects).

回复收藏 0 原文

凑诗 2024-09-16 05:56:45

我绝对不会采用3桌解决方案。通过将此数据分成 3 个表，您实际上无法让任何查询在不连接外键的情况下返回完整的订单标头，并且新订单的每次插入都会更新多个表和索引，这就是并发问题。我建议使用 2 个表，一张用于内部订单，一张用于外部订单。对于需要对两组订单中的数据进行合并查询的情况，请定义一个视图，该视图是两个表的并集。

我惊讶地看到产品 ID 和数量作为订单标题的一部分。我见过的每个订单跟踪数据库都将订单项目分解为一个单独的表，使用订单 ID 作为外键，以便单个订单可以包含多个产品（或具有不同数量、交货时间等的同一产品）。）。

回复收藏 0 原文

感悟人生的甜 2024-09-16 05:56:45

我不是纯粹主义者，所以 3nf 在有意义的时候是好的……但你不必想当然地认为它总是如此。

从务实的角度来看，您的目标是什么？你的优点/缺点列表是一个好的开始。我会在列表中添加更多的想法——只要你认为合适。

1) 数据库中的任何其他表是否需要与该数据关联（例如，联接）？这就是 RDB 的意义所在。

2）您的数据库会增长吗？即使 1 个表现在有意义，它总是有意义吗？如果您发现自己想要添加更多表，并且非规范化表迫使您“解决”它，处理返回的额外行、较慢的执行时间等，您会后悔的。

3）当发生什么时您的客户获得一个新的外部帐户，或者您有什么。你会创造一个全新的记录吗？您将如何回答诸如“客户某某的帐号是什么？”之类的问题。

...

我认为一般来说，我追求可扩展性，在这种情况下可能意味着 3nf。 1 个表在非常狭窄的范围内更容易处理，但如果有任何变化，您将处理“如何将此表拆分为正确相关的 3nf 表，而不弄乱已创建的所有依赖项它？”。那个一点也不好玩。

回复收藏 0 原文

她比我温柔 2024-09-16 05:56:45

在客户订购之前，帐户信息是否与客户相关联（即，您有另一个表，您可以在其中跟踪给定的 CustomerID 可以使用哪个帐户 ID）？您能否将所有帐户抽象为一个相当统一的模式（可以有一些空值），以便您有一个通用 AccountId（代理键），然后帐户表有 3 个 varchar 字段和一个跟踪帐户类型的字段（使用用于计费等）？

如果您可以做到这一点，那么您的订单只会跟踪一个 AccountID，因为订单（作为一个实体）实际上并不关心使用了哪种付款方式 - 它只关心它是否是该用户的合法/现有/批准的 AccountId。可以说，其他一切都是别人的事（计费或检查资金等），并且该实体及其处理无论如何都需要更多数据。

这可以使您的订单保持干净且无空值，并且也有利于关注点分离。

从概念上讲，您的订单实际上是所谓的事实表 - 仅包含数字和 FK-s，项目大小很小，但数量巨大。

所以：

 Table Order (
     - OrderId
     - Quantity
     - ProductId
     - DiscountId -- sonner or latter :-)
     - AccountId
     - PaymentStatus -- probaly FK as well or predefined constant
 )

 Table Account (
     - AccountId
     - BillingInfo  -- akka ext acct number as text
     - PrincialName -- akka ext company name, some equivalent for internal acct-s
     - AdditionalData
 )

Is account information associated with the customer before he can order (i.e. you have enother table where you track which account ID-s given CustomerID can use)? Can you abstract all accounts to a reasonably uniform schema (that one can have a few nulls) fo that you have one universal AccountId (surrogate key) and then Account's table has say 3 varchar fields and one that tracks the kind of the account (used for billing etc.) ?

If you can do that then your order tracks just one AccountID since the order (as an entity) really doesn't care which payment method was used - it only cares that it's a legit/existing/approved AccountId for that user. Everything else is someone else's business so to speak (billing or checking funds etc.) and that enity and it's processing will need more data anyway.

This keeps your Order clean and null-free and facilitates separation of concerns as well.

Conceptually, your Order is really so called fact table - carrying only numbers and FK-s, small in item size but with a huge number of them.

So:

 Table Order (
     - OrderId
     - Quantity
     - ProductId
     - DiscountId -- sonner or latter :-)
     - AccountId
     - PaymentStatus -- probaly FK as well or predefined constant
 )

 Table Account (
     - AccountId
     - BillingInfo  -- akka ext acct number as text
     - PrincialName -- akka ext company name, some equivalent for internal acct-s
     - AdditionalData
 )

回复收藏 0 原文

~没有更多了~