在一个非常简单的数据库中,数据库规范化有多重要?

发布于 2024-09-19 20:37:20 字数 396 浏览 6 评论 0原文

我正在制作一个非常简单的数据库 (mysql),本质上包含两种类型的数据,始终具有 1 对 1 的关系:

活动

  • 赞助商
  • 时间(可选)
  • 地点(城市、州)
  • 地点(可选)
  • 详细信息 URL

赞助商

  • 名称
  • URL


城市会经常重复,但是对于这样一个简单的数据库模式来说,拥有一个城市表真的有很大的价值吗?

该数据库是通过屏幕抓取网站来填充的。在此站点上,城市字段是通过从下拉列表中选择来填充的,因此不会出现错误输入等情况,并且很容易将记录与城市表进行匹配。我只是不确定即使我的数据库的用户经常按城市搜索,这是否有多大意义。

I am making a very simple database (mysql) with essentially two types of data, always with a 1 to 1 relationship:

Events

  • Sponsor
  • Time (Optional)
  • Location (City, State)
  • Venue (Optional)
  • Details URL

Sponsors

  • Name
  • URL

Cities will be duplicated often, but is there really much value in having a cities table for such a simple database schema?

The database is populated by screen-scraping a website. On this site the city field is populated via selecting from a dropdown, so there will not be mistypes, etc and it would be easy to match the records up with a city table. I'm just not sure there would be much of a point even if the users of my database will be searching by city frequently.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

沫离伤花 2024-09-26 20:38:36

如果您有兴趣了解标准化,您应该了解不标准化时会发生什么。对于每种范式(超出 1NF),都会因有害冗余而发生更新异常。

通常,可以围绕更新异常进行编程,有时这比总是归一化到最终程度更实用。

有时,由于无法规范化以及无法对应用程序进行补偿,数据库可能会进入不一致的状态。

在你的例子中,我能想到的最好的就是一种蹩脚的假设。如果某个城市的名称在一行中拼写错误,但在其他所有行中拼写正确,该怎么办?如果按城市和赞助商进行汇总会怎样?您的输出将反映错误,并将一组分为两组。如果这座城市只在数据库中拼写一次,也许会更好,无论是好还是坏。即使名称拼写错误,至少摘要的分组是正确的。

这值得标准化吗?嘿,这是你的项目,不是我的。你决定

If you are interested in learning about normalization, you should learn what happens when you don't normalize. For each normal form (beyond 1NF) there is an update anomaly that will occur as a consequence of harmful redundancy.

Often it's possible to program around the update anomalies, and sometimes that's more practical than always normalizing to the ultimate degree.

Sometimes, it's possible for a database to get into an inconsistent state due to failure to normalize, and failure to program the application to compensate.

In your example, the best I can come up with is a sort of lame hypotheical. What if the name of a city got mispelled in one row, but spelled correctly in all the others. What if you summarized by city and sponsor? Your output would reflect the error, and diovide one group into two groups. Maybe it would be better if the city were only spelled out once in the database, for better or for worse. At least the grouping for the summary would be correct, even if the name were mispelled.

Is this worth nromalizing for? Hey, it's your project, not mine. You decide

-残月青衣踏尘吟 2024-09-26 20:38:28

在我看来,答案取决于您是否想在数据输入过程中防止错误。如果这样做,您将需要一个 VENUES 表:

VENUES
City
State
VenueName

以及一个 CITIES 和 STATES 表。 (注意:我见过同一个城市在同一个州多次出现的情况,通常是较小的城镇,因此城市/州不包含唯一的二元组。通常有一个邮政编码可以消除歧义。)

为了防止数据出现的情况 -条目操作员输入 NY NY 的场地,该场地实际上位于 SF CA,您需要验证场地条目以查看记录中提供的城市/州中是否存在此类场地。

然后,您需要强制使用 CITY/STATE,并且必须编写代码来回滚事务并处理错误。

如果您不关心强制执行这种准确性,那么您实际上也不需要 CITY 和 STATES 表。

The answer hinges, IMO, on whether you want to prevent errors during data-entry. If you do, you will need a VENUES table:

VENUES
City
State
VenueName

as well as a CITIES and STATES table. (Note: I've seen situations where the same city occurs multiple times in the same state, usually smaller towns, so CITY/STATE do not comprise a unique dyad. Normally there's a zipcode to disambiguate.)

To prevent situations where the data-entry operator enters a venue for NY NY which is actually in SF CA, you'd need to validate the venue entry to see if such a venue exists in the city/state supplied on the record.

Then you'd need to make CITY/STATE mandatory, and have to write code to rollback the transaction and handle the error.

If you are not concerned about enforcing this sort of accuracy, then you don't really need to have CITY and STATES tables either.

烧了回忆取暖 2024-09-26 20:38:21

为什么继续实现正常化?你写得好像标准化的巨大成本超过了好处。在填充之前将其设置为正常形式比稍后尝试对其进行标准化更容易。

另外,我想知道你们的一对一关系。天真地,我会想象一项活动可能有多个赞助商,或者一个赞助商可能参与多个活动。但我不知道你的业务逻辑...

预计到达时间:
我不知道为什么我之前没有注意到这一点,但如果您确实不愿意标准化数据库,并且您知道事件之间始终存在一对一的关系和赞助商,那么为什么要将赞助商放在单独的表中呢?

听起来您可能对标准化是什么以及为什么要这样做有点困惑。

Why not go ahead and normalize? You write as if there are significant costs of normalizing that outweigh the benefits. It's easier to set it up in a normal form before you populate it than to try and normalize it later.

Also, I wonder about your 1-to-1 relationship. Naively, I would imagine that an event might have multiple sponsors, or that a sponsor might be involved in more than one event. But I don't know your business logic...

ETA:
I don't know why I didn't notice this before, but if you are really averse to normalizing your database, and you know that you will always have a 1-to-1 relationship between the events and sponsors, then why would you have the sponsors in a separate table?

It sounds like you may be a little confused about what normalization is and why you would do it.

不爱素颜 2024-09-26 20:38:14

直接回答:仅仅因为问题相对简单,没有理由不做一些事情来保持简单。用脚走路比用手走路容易得多。我不记得曾经说过:“哦,我只需走半英里,那是很短的距离,所以我还不如用手走路。”

更长的答案:如果您除了名称之外不保留有关城市的任何信息,并且您没有预设的城市列表(例如构建下拉列表),那么您的架构已经标准化。除了城市名称之外,城市表中还会有什么? (我认为州不能依赖于城市,因为在不同的州可能有两个同名的城市,例如俄亥俄州代顿和田纳西州代顿。)规范化的相关规则是“没有非关键依赖关系”,也就是说,你不能拥有依赖于非键数据的数据。例如,如果您有每个城市的纬度和经度,那么该数据将在引用同一城市的每条记录中重复。在这种情况下,您肯定需要创建一个单独的城市表来保存纬度和经度。当然,您可以创建一个“城市代码”,它是链接到城市表的整数或缩写。但如果没有关于一个城市的其他数据,我不知道这有什么好处。

从技术上讲,我假设城市取决于场地。如果地点是“洛克菲勒中心”,则意味着该城市一定是纽约。但如果地点是可选的,就会产生问题。一种可能是使用一个列出场地名称、城市和州的场地表,如果您没有指定场地,则为每个城市设置“未指定”。这在教科书上更正确,但在实践中,如果在大多数情况下你不指定 venu,它会获得很少的收益。如果大多数时候你确实指定了一个地点,这可能是一个好主意。

哦,还有,活动和赞助商之间真的是1:1的关系吗?我相信一项活动不能有多个赞助商。 (在现实生活中,有很多活动有多个赞助商,但也许出于您的目的,您只关心“主要赞助商”或类似的东西。)但是赞助商从来不会举办超过一场活动吗?这似乎不太可能。

Direct answer: Just because a problem is relatively simple is no reason to not do things to keep it simple. It's a lot easier to walk on my feet than on my hands. I don't recall ever saying, "Oh, I only have to go half a mile, that's a short distance so I might as well walk on my hands."

Longer answer: If you don't keep any information about a city other than it's name, and you don't have a pre-set list of cities (e.g. to build a drop-down), then your schema is already normalized. What would be in a City table other than the city name? (I presume State cannot be dependent on City because you could have two cities with the same name in different states, e.g. Dayton OH and Dayton TN.) The relevant rule of normalization is "no non-key dependencies", that is, you cannot have data that depends on data that is not a key. If you had, say, latitude and longitude of each city, then this data would be repeated in every record that referenced the same city. In that case you would certainly want to break out a separate city table to hold the latitude and longitude. You could, of course, create a "city code" that is an integer or abbreviation that links to a city table. But if there's no other data about a city, I don't see how this gains anything.

Technically, I would assume that City depends on Venue. If the venue is "Rockefeller Center", that implies that the city must be New York. But if venue is optional, this creates problems. One possibility is to have a Venue table that lists venue name, city, and state, and for cases where you don't specify a venue, have an "unspecified" for each city. This would be more textbook correct, but in practice if in most case you do not specify a venu, it would gain little. If most of the time you DO specify a venu, it would probably be a good idea.

Oh, and, is there really a 1:1 relation between event and sponsor? I can believe that an event cannot have more than one sponsor. (In real life, there are plenty of events with multiple sponsors, but maybe for your purposes you only care about a "primary sponsor" or some such.) But does a sponsor never hold more than one event? That seems unlikely.

夜雨飘雪 2024-09-26 20:38:05

为用户填充下拉框的城市数据来自哪里?你不想要一张桌子吗?

看起来您将位置视为包括城市和州在内的一项属性。假设您想仅按州而不是城市和州对事件进行排序或分析?如果您没有状态属性,这可能很难做到。从逻辑上讲,我希望州属于城市表 - 尽管这可能取决于您想要如何识别城市。

Where will the city data come from that populates your dropdown box for the user? Wouldn't you want a table for that?

It looks like you are treating Location as one attribute including city and state. Suppose you want to sort or analyse events by state alone rather than city and state? That could be hard to do if you don't have an attribute for state. Logically I would expect state to belong in a city table - although that may depend on exactly how you want to identify cities.

对不⑦ 2024-09-26 20:37:58

我认为你看待事物的方式是错误的——你应该始终正常化,除非你有充分的理由不这样做。

相信您的应用程序能够维护数据完整性是一种不必要的风险。您说数据是统一的,因为它是从下拉列表中选择的。如果有人侵入表单并修改数据,或者您的代码无意中允许具有相同名称的查询字符串参数怎么办?

I think you are looking at things the wrong way - you should always normalize unless you have a good reason not to.

Trusting your application to maintain data integrity is a needless risk. You say the data is made uniform because it is selected from a dropdown. What if someone hacks on the form and modifies the data, or if your code inadvertently allows a querystring param with the same name?

趁年轻赶紧闹 2024-09-26 20:37:50

现在规范化数据库。

优化规范化数据的查询比规范化一堆数据要容易得多。

你说现在很简单——这些东西有增长的趋势。正确设计,您将获得正确设计的经验和一些面向未来的经验。

Normalize the database now.

It's a lot easier to optimize queries on normalized data than it is to normalize a pile of data.

You say it's simple now - these things have a tendency to grow. Design it right and you'll get the experience of proper design and some future proofing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文