更好的数据库设计是:更多的表还是更多的列?

发布于 2024-07-06 05:06:36 字数 221 浏览 13 评论 0原文

一位前同事坚持认为,具有更多表且每个列较少的数据库比具有较少表且每个列较多的数据库更好。 例如,您将拥有姓名表、地址表、城市表等,而不是包含姓名、地址、城市、州、邮政编码等列的客户表。

他认为这种设计更加高效和灵活。 也许它更灵活,但我没有资格评论它的效率。 即使效率更高,我认为这些收益可能会被增加的复杂性所抵消。

那么,与更少列更多的表相比,更多列更少的表有什么显着的好处吗?

A former coworker insisted that a database with more tables with fewer columns each is better than one with fewer tables with more columns each. For example rather than a customer table with name, address, city, state, zip, etc. columns, you would have a name table, an address table, a city table, etc.

He argued this design was more efficient and flexible. Perhaps it is more flexible, but I am not qualified to comment on its efficiency. Even if it is more efficient, I think those gains may be outweighed by the added complexity.

So, are there any significant benefits to more tables with fewer columns over fewer tables with more columns?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(18

骷髅 2024-07-13 05:06:36

如果这些一对一的关系将来可能变成一对多或多对多,那么多表数据库就会灵活得多。 例如,如果您需要为某些客户存储多个地址,那么如果您有一个客户表和一个地址表,就会容易得多。 我真的看不出您可能需要复制地址的某些部分而不是其他部分的情况,因此单独的地址、城市、州和邮政编码表可能有点过头了。

The multi-table database is a lot more flexible if any of these one to one relationships may become one to many or many to many in the future. For example, if you need to store multiple addresses for some customers, it's a lot easier if you have a customer table and an address table. I can't really see a situation where you might need to duplicate some parts of an address but not others, so separate address, city, state, and zip tables may be a bit over the top.

话少心凉 2024-07-13 05:06:36

当您设计数据库时,您应该尽可能接近数据的含义,而不是您的应用程序需求!

一个好的数据库设计应该可以保持 20 年以上而不发生任何变化。

一个客户可能有多个地址,这就是现实。 如果您决定您的应用程序在第一个版本中仅限于一个地址,那么请关注您的应用程序的设计而不是数据!

如果您想简化查询,最好使用多个表而不是多个列并使用视图。

大多数时候,您会遇到数据库性能问题,这与网络性能有关(具有一行结果的链式查询、获取不需要的列等),而不是与查询的复杂性有关。

When you design your database, you should be as close as possible from the meaning of data and NOT your application need !

A good database design should stand over 20 years without a change.

A customer could have multiple adresses, that's the reality. If you decided that's your application is limited to one adresse for the first release, it's concern the design of your application not the data !

It's better to have multiple table instead of multiple column and use view if you want to simplify your query.

Most of time you will have performance issue with a database it's about network performance (chain query with one row result, fetch column you don't need, etc) not about the complexity of your query.

简美 2024-07-13 05:06:36

表格的列数较少有一些优点,但您还需要查看上面的场景并回答以下问题:

是否允许客户拥有超过 1 个地址? 如果不是,则不需要单独的地址表。 如果是这样,那么单独的表就会变得很有帮助,因为您可以根据需要轻松添加更多地址,而向表中添加更多列会变得更加困难。

There are advantages to having tables with fewer columns, but you also need to look at your scenario above and answer these questions:

Will the customer be allowed to have more than 1 address? If not, then a separate table for address is not necessary. If so, then a separate table becomes helpful because you can easily add more addresses as needed down the road, where it becomes more difficult to add more columns to the table.

南薇 2024-07-13 05:06:36

我会考虑将规范化作为第一步,因此城市、县、州、国家作为单独的列会更好...SQL 语言的强大功能,再加上今天的 DBMS-es,允许您稍后在需要查看时对数据进行分组从其他一些非标准化的角度来看。

当系统正在开发时,如果您认为某些部分是一种改进,您可能会考虑“非标准化”。

i would consider normalizing as the first step, so cities, counties, states, countries would be better as separate columns... the power of SQL language, together with today DBMS-es allows you to group your data later if you need to view it in some other, non-normalized view.

When the system is being developed, you might consider 'unnormalizing' some part if you see that as an improvement.

末蓝 2024-07-13 05:06:36

我认为在这种情况下平衡是合理的。 如果将一列放入表中有意义,则将其放入表中,如果没有意义,则不要放入。 您同事的方法肯定有助于规范化数据库,但如果您必须将 50 个表连接在一起才能获取所需的信息,那么这可能不是很有用。

我想我的答案是,用你最好的判断。

I think balance is in order in this case. If it makes sense to put a column in a table, then put it in the table, if it doesn't, then don't. Your coworkers approach would definately help to normalize the database, but that might not be very useful if you have to join 50 tables together to get the information you need.

I guess what my answer would be is, use your best judgement.

如日中天 2024-07-13 05:06:36

唔。

我认为这是一种洗涤,取决于您特定的设计模型。 绝对将具有多个字段的实体分解到自己的表中,或者其构成可能会随着应用程序的需求变化而改变的实体(例如 - 我无论如何都会分解地址,因为它有很多字段,但我如果您认为有可能需要处理外国地址(可以采用不同的形式(与电话号码相同)),那么尤其会这样做。

也就是说,当你让它工作时,请留意性能。 如果您分离出一个需要进行大型且昂贵的连接的实体,那么将该表旋转回原始表可能会成为更好的设计决策。

Hmm.

I think its a wash and depends on your particular design model. Definitely factor out entities that have more than a few fields out into their own table, or entities whose makeup will likely change as your application's requirements changes (for instance - I'd factor out address anyways, since it has so many fields, but I'd especially do it if you thought there was any chance you'd need to handle foreign country addresses, which can be of a different form. The same with phone numbers).

That said, when you're got it working, keep an eye out on performance. If you've spun an entity out that requires you to do large, expensive joins, maybe it becomes a better design decision to spin that table back into the original.

岛歌少女 2024-07-13 05:06:36

这有很多方面,但从应用程序效率的角度来看,更多的表有时会更高效。 如果您有几个表,其中有一堆列,每次数据库执行操作时它都有机会进行锁定,则在锁定期间更多数据将不可用。 如果锁升级到页和表(希望不是表:)),您可以看到这会如何减慢系统速度。

There are many sides to this, but from an application efficiency perspective more tables can be more efficient at times. If you have a few tables with a bunch of columns every time the db as to do an operation it has a chance of making a lock, more data is made unavailable for the duration of the lock. If locks get escalated to page and tables (well hopefully not tables :) ) you can see how this can slow down the system.

轻许诺言 2024-07-13 05:06:36

使用尽可能少的列进行查询有巨大的好处。 但表本身可以有很大的数字。 Jeff 对此也说了一些话。

基本上,请确保在执行查询时请求的数据不会超出您的需要 - 查询的性能与您请求的列数直接相关。

There are huge benefits to queries using as few columns as possible. But the table itself can have a large number. Jeff says something on this as well.

Basically, make sure that you don't ask for more than you need when doing a query - performance of queries is directly related to the number of columns you ask for.

笨死的猪 2024-07-13 05:06:36

我认为在做出决定之前,您必须查看所存储的数据类型。 拥有地址表固然很棒,但前提是多人共享​​同一地址的可能性很高。 如果每个人都有不同的地址,则将该数据保存在不同的表中只会引入不必要的联接。

我不认为拥有城市表有什么好处,除非城市本身就是您在应用程序中关心的实体。 或者,如果您想限制用户可用的城市数量。

最重要的是,在开始提高效率之前,此类决策必须考虑应用程序本身。 国际海事组织。

I think you have to look at the kind of data you're storing before you make that decision. Having an address table is great but only if the likelihood of multiple people sharing the same address is high. If every person had different addresses, keeping that data in a different table just introduces unnecessary joins.

I don't see the benefit of having a city table unless cities in of themselves are entities you care about in your application. Or if you want to limit the number of cities available to your users.

Bottom line is decisions like this have to take the application itself into considering before you start shooting for efficiency. IMO.

恰似旧人归 2024-07-13 05:06:36

首先,标准化你的表格。 这可确保您避免冗余数据,减少需要扫描的数据行,从而改进查询。 然后,如果您遇到要连接的规范化表导致查询需要很长时间来处理(昂贵的连接子句)的情况,请在更合适的地方进行非规范化。

First, normalize your tables. This ensures you avoid redundant data, giving you less rows of data to scan, which improves your queries. Then, if you run into a point where the normalized tables you are joining are causing the query to take to long to process (expensive join clause), denormalize where more appropriate.

天邊彩虹 2024-07-13 05:06:36

很高兴看到这么多鼓舞人心且有基础的答案。

我的答案是(不幸的是):这要看情况。

两种情况:
* 如果您创建一个要使用多年的数据模型,因此可能必须适应许多未来的更改:选择更多的表和更少的行以及相当严格的标准化。
* 在其他情况下,您可以在更多表-更少行或更少表-更多行之间进行选择。 特别是对于该主题相对较新的人来说,最后一种方法可以更加直观且易于理解。

这对于在面向对象方法和其他选项之间进行选择也是有效的。

Good to see so many inspiring and well based answers.

My answer would be (unfortunately): it depends.

Two cases:
* If you create a datamodel that is to be used for many years and thus possibly has to adept many future changes: go for more tables and less rows and pretty strict normalization.
* In other cases you can choose between more tables-less rows or less tables-more rows. Especially for people relatively new to the subject this last approach can be more intuitive and easy to comprehend.

The same is valid for the choosing between the object oriented approach and other options.

简美 2024-07-13 05:06:36

在设计数据库时,我遵循一些相当简单的经验规则,我认为这些规则可以用来帮助做出这样的决策......

  1. 支持标准化。 非规范化是一种优化形式,需要进行所有必要的权衡,因此应该使用 YAGNI 来实现态度。
  2. 确保引用数据库的客户端代码与模式充分解耦,从而无需对客户端进行重大重新设计。
  3. 当非规范化对性能或查询复杂性有明显的好处时,不要害怕它。
  4. 当数据量和使用场景允许时,使用视图或下游表来实现反规范化,而不是对架构的核心进行反规范化。

这些规则的通常结果是,初始设计将倾向于表而不是列,重点是消除冗余。 随着项目的进展和非规范化点的确定,整体结构将朝着一种平衡发展,以有限的冗余和列扩散来妥协,以换取其他有价值的好处。

I have a few fairly simple rules of thumb I follow when designing databases, which I think can be used to help make decisions like this....

  1. Favor normalization. Denormalization is a form of optimization, with all the requisite tradeoffs, and as such it should be approached with a YAGNI attitude.
  2. Make sure that client code referencing the database is decoupled enough from the schema that reworking it doesn't necessitate a major redesign of the client(s).
  3. Don't be afraid to denormalize when it provides a clear benefit to performance or query complexity.
  4. Use views or downstream tables to implement denormalization rather than denormalizing the core of the schema, when data volume and usage scenarios allow for it.

The usual result of these rules is that the initial design will favor tables over columns, with a focus on eliminating redundancy. As the project progresses and denormalization points are identified, the overall structure will evolve toward a balance that compromises with limited redundancy and column proliferation in exchange for other valuable benefits.

如痴如狂 2024-07-13 05:06:36

我赞成增加桌子,但仅限于一定程度。 以您的示例为例,如果您将用户信息分成两个表,例如“USERS”和“ADDRESS”,那么您可以灵活地为每个用户拥有多个地址。 一个明显的应用是具有单独的帐单地址和送货地址的用户。

支持使用单独的 CITY 表的论点是,您只需将每个城市的名称存储一次,然后在需要时引用它。 这确实减少了重复,但在这个例子中我认为这是多余的。 它可能更节省空间,但是当您从数据库中选择数据时,您将付出联接的代价。

I would argue in favor of more tables, but only up to a certain point. Using your example, if you separated your user's information into two tables, say USERS and ADDRESS, this gives you the flexibility to have multiple addresses per user. One obvious application of this is a user who has separate billing and shipping addresses.

The argument in favor of having a separate CITY table would be that you only have to store each city's name once, then refer to it when you need it. That does reduce duplication, but in this example I think it's overkill. It may be more space efficient, but you'll pay the price in joins when you select data from your database.

独闯女儿国 2024-07-13 05:06:36

这听起来不太像关于表/列的问题,而是关于规范化的问题。 在某些情况下,具有高度的规范化(本例中为“更多表”)很好,也很干净,但通常需要大量 JOIN 才能获得相关结果。 如果数据集足够大,这可能会降低性能。

Jeff 写了一些关于 StackOverflow 设计的内容。 另请参阅 Dare Obasanjo 链接到的 Jeff 帖子。

It doesn't sound so much like a question about tables/columns, but about normalization. In some situations have a high degree of normalization ("more tables" in this case) is good, and clean, but it typically takes a high number of JOINs to get relevant results. And with a large enough dataset, this can bog down performance.

Jeff wrote a little about it regarding the design of StackOverflow. See also the post Jeff links to by Dare Obasanjo.

っ左 2024-07-13 05:06:36

每个表应该只包含属于由主键唯一标识的实体的列。 如果数据库中的所有列都是同一实体的属性,那么您只需要一张包含所有列的表。

但是,如果任何列可能为空,则需要将每个可为空的列放入其自己的表中,并使用主表的外键来对其进行规范化。 这是一种常见的情况,因此为了更简洁的设计,您可能会向现有表添加比列更多的表。 此外,通过将这些可选属性添加到它们自己的表中,它们将不再需要允许空值,并且您可以避免大量与 NULL 相关的问题。

Each table should only include columns that pertain to the entity that's uniquely identified by the primary key. If all the columns in the database are all attributes of the same entity, then you'd only need one table with all the columns.

If any of the columns may be null, though, you would need to put each nullable column into its own table with a foreign key to the main table in order to normalize it. This is a common scenario, so for a cleaner design, you're likley to be adding more tables than columns to existing tables. Also, by adding these optional attributes to their own table, they would no longer need to allow nulls and you avoid a slew of NULL-related issues.

山川志 2024-07-13 05:06:36

完全规范化的设计(即“更多表”)更加灵活,更易于维护,并且可以避免数据重复,这意味着您的数据完整性将更容易执行。

这些都是实现正常化的有力理由。 我会选择首先标准化,然后仅在您发现性能成为问题后对特定表进行反规范化。

我的经验是,在现实世界中,即使数据集非常大,您也不会达到需要非规范化的程度。

A fully normalized design (i.e, "More Tables") is more flexible, easier to maintain, and avoids duplication of data, which means your data integrity is going to be a lot easier to enforce.

Those are powerful reasons to normalize. I would choose to normalize first, and then only denormalize specific tables after you saw that performance was becoming an issue.

My experience is that in the real world, you won't reach the point where denormalization is necessary, even with very large data sets.

温柔戏命师 2024-07-13 05:06:36

这取决于您的数据库风格。 例如,MS SQL Server 往往更喜欢较窄的表。 这也是更“标准化”的方法。 其他引擎可能更喜欢相反的方式。 大型机往往属于这一类。

It depends on your database flavor. MS SQL Server, for example, tends to prefer narrower tables. That's also the more 'normalized' approach. Other engines might prefer it the other way around. Mainframes tend to fall in that category.

浅浅 2024-07-13 05:06:36

就像其他一切一样:这取决于情况。

关于列数与表数没有硬性规定。

如果您的客户需要拥有多个地址,那么为此建立一个单独的表是有意义的。 如果您有充分的理由将“城市”列标准化为自己的表,那么也可以这样做,但我以前没有见过这种情况,因为它是一个自由格式字段(通常)。

桌子沉重的标准化设计在空间方面是高效的,看起来“教科书般的好”,但可能会变得极其复杂。 看起来不错,直到您必须执行 12 次连接才能获取客户的姓名和地址。 这些设计在最重要的性能(查询)方面并不是自动表现出色。

如果可能的话,避免复杂性。 例如,如果客户只能有两个地址(不是任意多个),那么将它们全部保存在一个表中可能是有意义的(CustomerID、Name、ShipToAddress、BillingAddress、ShipToCity、BillingCity 等)。

这是 Jeff 关于该主题的帖子

Like everything else: it depends.

There is no hard and fast rule regarding column count vs table count.

If your customers need to have multiple addresses, then a separate table for that makes sense. If you have a really good reason to normalize the City column into its own table, then that can go, too, but I haven't seen that before because it's a free form field (usually).

A table heavy, normalized design is efficient in terms of space and looks "textbook-good" but can get extremely complex. It looks nice until you have to do 12 joins to get a customer's name and address. These designs are not automatically fantastic in terms of performance that matters most: queries.

Avoid complexity if possible. For example, if a customer can have only two addresses (not arbitrarily many), then it might make sense to just keep them all in a single table (CustomerID, Name, ShipToAddress, BillingAddress, ShipToCity, BillingCity, etc.).

Here's Jeff's post on the topic.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文