每个表中都有地址列,或者有一个被其他表引用的地址表?

发布于 2024-09-10 04:38:39 字数 60 浏览 7 评论 0原文

假设我有三张表:住宿、火车站和机场。每个表中是否都有地址列或其他表引用的地址表?是否存在过度正常化的情况?

Say I had three tables: Accommodation, Train Stations and Airports. Would I have address columns in each table or an address table that is referenced by the other tables? Is there such a thing as over-normalization?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

白首有我共你 2024-09-17 04:38:40

如果您使用的是 Oracle 9i,则可以在表中存储地址对象。这将消除对地址格式的(合理的)担忧。

If you are using Oracle 9i, you could store address objects in your tables. That would remove the (justified) concerns about address formats.

简单爱 2024-09-17 04:38:40

我同意 S.Lott 的观点,并想补充一点:

  1. 一个好的答案取决于您已经知道的内容。然而,关系数据库理论的基本“数学”定义了非常明确的、不同的标准化级别。当你达到终极正常形态时,你就无法再正常化了。

  2. 根据您想要用三个实体建模的内容以及识别它们的方式,您可以想出非常不同的概念数据模型,所有这些模型都可以用规范形式的混合来表示 - 或者根本不规范化(就像 1 个表,用于存储所有带有描述符的数据,并且到处都是 NULL 孔...)。
    考虑将三个实体规范化为最终规范形式。我现在可以引入一个新的需求、用例或扩展,如果您查看其内容,它会为当前的描述性属性提供某种有序的、引用的或结构化的性质。然后,模型应该表示这种行为,而以前的属性也许最好是由其他实体引用的单独实体。

  3. 过度标准化?只有在某种意义上,您才能标准化给定的模型,从而使其在给定的数据库平台上存储或处理效率低下。根据在那里可以有效处理的内容,您可能希望对某些方面进行非规范化,以冗余换取速度(数据仓库数据库一直这样做)和洞察力,反之亦然。 ?

到目前为止,我见过的所有(工作)数据库设计要么具有相当规范化的概念数据模型,要么在逻辑和/或物理数据模型级别(用 Sybase PowerDesigner 术语来说)完成了相当多的非规范化,以使模型“易于管理” ——要么是这样,要么它们无法工作,即失败,因为维护问题很快就变得非常严重。

I agree with S.Lott, and would like to add:

  1. A good answer depends on what you know already. The basic "math" of relational database theory, however, defines very well-defined, distinct levels of normalization. You cannot normalize anymore when you've reached the ultimate normal form.

  2. Depending on what you want to model with your three entities, and how you identify them, you can come up with very different conceptual data models, all of which can be represented in a mix of normal forms -- or unnormalized at all (like 1 table for all data with descriptors and NULL holes all over the place...).
    Consider you normalize your three entities to the ultimate normal form. I can now introduce a new requirement, or use case, or extension, which gives an upto-now descriptive attribute a somehow ordered, or referencing, or structured nature if you look at its content. Then, the model should represent this behavior, and what used to be an attribute perhaps will better be a separate entity referenced by other entities.

  3. Over-normalization? Only in the sense that can you normalize a given model so it gets inefficient to store, or process, on a given DB platform. Depending on what can be handled efficiently there, you might want to de-normalize certain aspects, trading off redundancy for speed (data warehouse dbs do this all the time), and insight, or vice versa.

All (working) db designs I've seen so far either have a rather normalized conceptual data model, with quite some denormalization done at the logical and/or physical data model level (speaking in Sybase PowerDesigner terms) to make the model "manageable" -- either that, or they were not working, i.e. failed because the maintenance problems became kingsize real quick.

唐婉 2024-09-17 04:38:40

当您说“地址”时,我认为您指的是完整的地址,例如街道、城市、州/省,也许是国家/地区和邮政编码。这是 4 或 5 个字段,如果您允许“地址行 1”和“地址行 2”、转交等,可能会更多。这肯定应该在一个单独的表中,并有一个“addressid”链接到站,等表。否则,您将创建同一组字段定义的 3 个单独副本。这是个坏消息,因为它需要额外的努力来保持它们的一致性。例如,如果最初您只处理美国地址(我是美国人,所以我假设是美国),但后来您发现您还需要允许加拿大人,该怎么办?您需要扩大邮政编码字段的大小并添加国家/地区代码。如果有一个公用表,那么您只需执行一次。如果没有,那么你必须这样做三遍。而且“三次”很可能不仅仅是更改数据库架构,而是更改程序中处理地址的每个位置。

标准化的好处之一是最大限度地减少变化的影响。

When you say "address", I presume you mean a complete address, like street, city, state/province, maybe country, and zip/postal code. That's 4 or 5 fields, maybe more if you allow for "address line 1" and "address line 2", care-of's, etc. That should definately be in a separate table, with an "addressid" to link to the Station, etc tables. Otherwise, you are creating 3 separate copies of the same set of field definitions. That's bad news because it creates extra effort to keep them consistent. Like, what if initially you are only dealing with U.S. addresses (I'm an American so I'll assume U.S.), but later you find you also need to allow for Canadians. You'll need to expand the size of the postal code field and add a country code. If there's a common table, then you only have to do this once. If there isn't, then you have to do this three times. And it's likely that the "three times" is not just changing the database schema, but changing every place in your programs that processes an address.

One of the benefits of normalization is to minimize the impact of changes.

墟烟 2024-09-17 04:38:40

有时您想要非规范化以使查询更高效。但只有在您有充分的理由相信完全标准化的模型会造成严重的低效率问题之后,才应该非常谨慎地进行此操作。根据我的卑微经验,大多数程序员都太快地进行非规范化,通常会快速“哦,将其分解到一个单独的表中太麻烦了”。

There are times when you want to denormalize to make queries more efficient. But this should be done very cautiously, only after you have good reason to believe that the fully normalized model creates serious inefficiency problems. In my humble experience, most programmers are far to quick to denormalize, usually with a quick "oh, breaking that out into a separate table is too much trouble".

演多会厌 2024-09-17 04:38:40

我认为在这种情况下,每个表中都有地址列是可以的。您几乎不会有一个地址会被使用两次以上。大多数地址每个实体仅使用一个。

但是额外的表中可能包含街道、城市、国家的名称……

最重要的是,每个火车站、住宿和机场可能只有一个地址,因此这是一种 n:1 关系。

I think in this situation it is OK to have address columns in each table. You'll hardly have an address which will be used more than two times. Most of the adresses will be used just one per entity.

But what could be in an extra table are names of streets, cities, countries...

And most important every train station, accomodoation and airport will probably have just one address so it's an n:1 relation.

熊抱啵儿 2024-09-17 04:38:40

我只能对此处已发布的答案添加一个更具建设性的注释。无论您选择如何标准化数据库,当地址标准化(看起来相同)时,这个过程就变得几乎微不足道。这是因为当您努力防止重复时,所有实际上相同的地址看起来确实相同

现在,标准化地址并非易事。有一些 CASS 服务可以为您执行此操作(针对美国地址),并且这些服务已获得 USPS 的认证。实际上,我在 SmartyStreets 工作,这是我们的专业知识,所以我建议您从那里开始搜索。您可以执行批处理或使用 API 在收到地址时对其进行标准化。

如果没有这样的东西,您的数据库可能会标准化,但重复的地址数据(无论是正确的还是不完整的和无效的等)仍然会渗透进来,因为它们可以采用很多很多的形式。如果您对此还有任何疑问,我将亲自为您提供帮助。

I can only add one more constructive note to the answers already posted here. However you choose to normalize your database, that very process becomes almost trivial when the addresses are standardized (look the same). This is because as you endeavor to prevent duplicates, all the addresses that are actually the same do look the same.

Now, standardizing addresses is not trivial. There are CASS services which do this for you (for US addresses) which have been certified by the USPS. I actually work for SmartyStreets where this is our expertise, so I'd suggest you start your search there. You can either perform batch processing or use the API to standardize the addresses as you receive them.

Without something like this, your database may be normalized, but duplicate address data (whether correct or incomplete and invalid, etc) will still seep in because of the many, many forms they can take. If you have any further questions about this, I'll personally assisty you.

薆情海 2024-09-17 04:38:39

数据库规范化就是构建维护某些功能的关系(表)
关系(表)内的事实(列)之间以及各种关系(表)之间的依赖关系
组成架构(数据库)。有点满嘴,但这就是它的全部内容。

五个简单指南关系数据库理论中的范式
是范式的经典参考。本文简单地定义了每种范式的本质是什么
及其对于数据库表设计的意义。这是一个非常好的“试金石”参考。

要正确回答您的具体问题,需要额外的信息。您必须提出的一些关键问题
是:

  • 地址是简单事实(例如文本块)还是复合事实(例如
    由多个属性组成:地址行、城市名称、邮政编码等)
  • 与“住宿”相关的其他“事实”是什么,
    “机场”和“火车站”?
  • 哪些“事实”可以唯一且最低限度地识别“机场”、“住宿”
    和“火车站”(这些事实通常称为密钥或候选密钥)?
  • 地址事实和事实之间存在哪些功能依赖性
    组成每个关系键?

综上所述,您问题的答案并不像人们希望的那样直接!

是否存在“过度正常化”的情况?或许。这取决于是否
您已识别并用于构建表的功能依赖性是
对您的应用领域具有重要意义。

例如,假设确定地址
由多个属性组成;其中之一是邮政编码。从技术上讲是邮政
代码也是一个复合项目(至少加拿大邮政编码是)。进一步规范化你的
数据库来识别这些事实可能会过度标准化。这是因为
邮政编码的组成部分与您的申请无关,因此需要考虑
将它们纳入数据库设计将是过度规范化。

Database Normalization is all about constructing relations (tables) that maintain certain functional
dependencies among the facts (columns) within the relation (table) and among the various relations (tables)
making up the schema (database). Bit of a mouth-full, but that is what it is all about.

A Simple Guide to Five Normal Forms in Relational Database Theory
is the classic reference for normal forms. This paper defines in simple terms what the essence of each normal form is
and its significance with respect to database table design. This is a very good "touch-stone" reference.

To answer your specific question properly requires additional information. Some critical questions you have to ask
are:

  • Is an Address a simple fact (e.g. blob of text) or a composite fact (e.g.
    composed of multiple attributes: Address line, City Name, Postal Code etc.)
  • What are the other "facts" relating to "Accommodation",
    "Airport" and "Train Station"?
  • What sets of "facts" uniquely and minimally identify an "Airport", an "Accommodation"
    and a "Train Station" (these facts are typically called a key or candidate key)?
  • What functional dependencies exist among Address facts and the facts
    composing each relations key?

All this to say, the answer to your question is not as straight forward as one might hope for!

Is there such a thing as "over normalization"? Maybe. This depends on whether the
functional dependencies you have identified and used to build your tables are
of significance to your application domain.

For example, suppose it was determined that an address
was composed of multiple attributes; one of which is postal code. Technically a postal
code is a composite item too (at least Canadian Postal Codes are). Further normalizing your
database to recognize these facts would probably be an over-normalization. This is because
the components of a postal code are irrelevant to your application and therefore factoring
them into the database design would be an over-normalization.

淡淡的优雅 2024-09-17 04:38:39

对于地址,我几乎总是创建一个单独的地址表。不仅为了标准化,而且为了存储字段的一致性。

至于过度正常化这种事情,绝对是有的!很难给你关于什么是过度正常化、什么不是过度正常化的指导,因为我认为这主要来自经验。然而,如果按照规范化各个级别的书籍进行操作,那么一旦开始变得难以看出事情的具体情况,您可能就走得太远了。

也可以查看所有示例/示例数据库。它们会给您一个很好的指示,告诉您何时应该分割数据,何时不应该分割数据。

另外,请充分了解您存储的数据的类型和数量,以及访问速度等。出于许多性能和可扩展性的原因,许多现代 Web 软件正在完全非规范化。值得研究一下为什么以及何时应该和不应该去标准化的原因。

For addresses, I would almost always create a separate address table. Not only for normalization but also for consistency in fields stored.

As for such a thing as over-normalization, absolutely there is! It's hard to give you guidance on what is and isn't over-normalization as I think it mostly comes from experience. However, follow the books on each level of normalization and then once it starts to get difficult to see where things are you've probably gone too far.

Look at all the sample/example databases you can as well. They will give you a good indication on when you should be splitting out data and when you shouldn't.

Also, be well aware of the type and amount of data you're storing, along with the speed of access, etc. A lot of modern web software is going fully de-normalized for many performance and scalability reason. It's worth looking into those for reason why and when you should and shouldn't de-normalize.

无语# 2024-09-17 04:38:39

每个表中是否都有地址列或其他表引用的地址表?

机场、火车站和住宿可以有不同的地址格式吗?

单个地址表最大限度地减少了处理地址所需的工作 - 套件、RR、邮政编码、州/省...

是否存在过度标准化这样的事情?

标准化有不同的级别。我只遇到过我认为糟糕的设计而不是规范化。

Would I have address columns in each table or an address table that is referenced by the other tables?

Can airports, train stations and accommodation each have a different address format?

A single ADDRESS table minimizes the work necessary dealing with addresses - suite, RR, postal/zip code, state/province...

Is there such a thing as over-normalization?

There are different levels of normalization. I've only encountered what I'd consider poor design rather than normalization.

_蜘蛛 2024-09-17 04:38:39

就我个人而言,我会去另一张桌子。

我认为它使设计更加清晰,使地址报告更加简单,并使您需要对地址架构进行的任何更改变得更加容易。

如果您稍后需要对其进行非规范化,您始终可以创建两个视图,其中包含火车站和机场信息以及您需要的任何地址信息。

Personally I'd go for another table.

I think it makes the design cleaner, makes reporting on addresses much simpler and will make any changes you need to make to the address schema easier.

If you need to have it denormalized later on you can always create two views that contain the Train station and airport information along with any address information you need.

终止放荡 2024-09-17 04:38:39

这并不是我所理解的标准化。您似乎没有谈论删除冗余,只是谈论如何对存储或数据模型进行分区。我假设住宿、火车站和机场的地址示例都是不相交的?

据我所知,只有当你开始按照思路思考时,这才会成为常态。邮政编码在功能上取决于街道地址,因此应将其分解到其自己的表中。

在这种情况下,根据上下文,这可能是理想的或不理想的。如果您管理记录并可以确保正确性,则可能是理想的选择;如果用户可以更新自己的记录,则不太理想。

一个相关的问题是标准化一个人的名字是否太过分了< /a>

This isn't really what I understand by normalisation. You don't seem to be talking about removing redundancy, just how to partition the storage or data model. I'm assuming that the example of addresses for Accommodation, Train Stations and Airports will all be disjoint?

As far as I know it would only be normalisation if you started thinking along the lines. Postcode is functionally dependent upon street address so should be factored out into its own table.

In which case this could be ever desirable or undesirable dependent upon context. Perhaps desirable if you administer the records and can ensure correctness, and less desirable if users can update their own records.

A related question is Is normalizing a person’s name going too far?

小…红帽 2024-09-17 04:38:39

如果您有一个对性能非常敏感的项目/功能,那么在某些情况下对数据库进行非规范化可能是明智之举。然而,这可能会因各种原因导致维护问题。您可能想使用缓存表复制数据,但这也有缺点。这确实是根据具体情况而定,但在正常实践中,数据库规范化是一件好事。我见过的 99% 的非标准化数据库都不是设计使然,而是开发人员的误解/错误。

If you have a project/piece of functionality that is very performance sensitive, it may be smart to denormalize the database in some cases. However, this can lead to maintenance issues for various reasons. You may instead want to duplicate the data with cache tables but there are drawbacks to this as well. It's really a case by case basis but in normal practice, database normalization is a good thing. 99% of the non-normalized databases I've seen are not by design, but rather by a misunderstanding/mistake by the developer.

请止步禁区 2024-09-17 04:38:39

每个表中是否都有地址列或其他表引用的地址表?

正如其他人所提到的,这实际上并不是规范化的问题,因为您并没有尝试减少冗余或组织依赖关系。无论哪种方式都是完全可以接受的。如果您要进行集中验证或特定于地址的业务逻辑,则将地址移至单独的表可能有意义。

是否存在过度标准化这样的事情?

是的。如前所述,在大型系统(大量数据、大量事务或两者兼而有之)中,您可以标准化到性能成为问题的程度。这就是许多系统使用非规范化数据库进行报告和查询的原因。

除了性能之外,还存在数据查询是否容易的问题。在存在大量最终用户数据查询的系统中(可能很危险!),非规范化结构对于大多数非技术人员或非数据库人员来说更容易理解。

与我们处理的大多数事情一样,这是理解、性能和未来可维护性之间的权衡,并且对于在任何给定系统中划清界限的问题很少有明确的答案。

凭借经验,您将了解最适合您编写的系统的界限在哪里。

话虽如此,我更倾向于选择更多的标准化,而不是更少的标准化。

Would I have address columns in each table or an address table that is referenced by the other tables?

As others have alluded to, this is not really a question of normalization because you're not attempting to reduce redundancy or organize dependencies. Either way is perfectly acceptable. Moving the addresses to a separate table might make sense if you are going to have centralized validation or business logic specific to addresses.

Is there such a thing as over-normalization?

Yes. As has been mentioned, in large systems (lots of data, lots of transactions, or both) you can normalize to the point where performance becomes an issue. This is why lots of systems use denormalized database for reporting and querying.

In addition to performance though, there is also the issue of how easy the data is to query. In systems where there will be a lot of end-user querying of the data (can be dangerous!), a denormalized structure is easier for most non-technical or non-database people to understand.

Like most things we deal with, it's a trade-off between understanding, performance, and future maintainability and there is rarely a clear-cut answer to where you draw the line in any given system.

With experience, you will learn where the line is best drawn for the systems you write.

With that said, my preference is to err on the side of more vs less normalization.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文