当表中的每一列都是外键时,这是好事还是过分了?
我正在建立一个可能车辆的数据库,其中每个条目都有品牌、型号、年份和发动机。
我将其分为品牌表(福特、雪佛兰、宝马等)、型号表(Impala、Camaro、F-150 等)、年份表(1920 年,... 2012 年)和发动机表(327、350 等)。
由于我现在每个品牌、型号、年份和引擎都有一个表,并且它们每个都可以有唯一的主键,因此主“MakesModelsAndYears”表中的每一行都由四个外键组成。
这是多余的,还是真的比我创建唯一索引的一张大表更有效地存储?我对“一张大表”方法的担忧是,像 1970 年这样的年份会重复很多次(1970 年雪佛兰 Impala、1969 年雪佛兰 Camaro 等),就像模型甚至发动机一样。
感谢您的指导!
后续:
对于那些后续的人,我将反馈合并到答案中并得出了此架构。该图像没有详细显示 FK,但它们实际上是答案所建议的内容:
I am building a database of possible vehicles, where each entry has a Make, Model, Year, and Engine.
I split that into a tables of Makes (Ford, Chevrolet, BMW, etc) and Models (Impala, Camaro, F-150, etc) and Years (1920, ... 2012) and Engines (327, 350, etc).
Since I now have a table for each of Make, Model, Year, and Engine, and they can each have unique primary key, every row in the main "MakesModelsAndYears" table is just made up of four foreign keys.
Is this overkill, or really stored more efficiently than just one big table where I created unique indexes? My concern with the "one big table" approach is that the years, like 1970, would be repeated many times (1970 Chevrolet Impala, 1969 Chevrolet Camaro, etc) as would have model and even engine.
Thanks for any guidance!
Followup:
For those following along, I incorporated the feedback in the answers and arrived that this schema. The image doesn't show the FKs in detail, but they're effectively what the answer suggested:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果表包含 2、3、4 或更多外键,并且主键是这些外键的组合(如果适合您的模型),那么完全没有问题。
我认为这种设计的唯一问题是它允许“BMW Escort”或“Ford Z4”。也许你可以将设计更改为:
There is no issue at all having a table with 2, 3, 4 or more Foreign Keys, and the Primary Key being the combination of those FKs, if that fits with your model.
The only problem I see with this design is that it allows a "BMW Escort" or a "Ford Z4". Perhaps you can change the design to:
雪佛兰不生产野马。福特在 1960 年并没有制造野马。你的结构将允许很多废话。
问题不在于每一列都是外键;而是每一列都是外键。这不一定有什么问题。问题是外键错误。
这就是为什么他们错了。当您规范化关系时,您从关系开始,识别候选键,并计算出函数依赖关系。仅仅为每一列制作单列“查找”表并不是标准化,并且它不会以所需的方式限制您的数据。 (在这种特殊情况下,约束是缺失的部分,而不是标准化到 5NF。)
应该清楚的是,唯一的候选键是 {Make, Model, Yr, Engine}。所以这个表都是key,没有非prime属性。
要添加“查找”表作为数据约束,仅仅说在第一列中您必须从 {Ford, Chevrolet, Cadillac, Vauxhall, Opel} 中进行选择,而在第二列中您必须从 { F-150、Camaro、CTS、Astra}。品牌和型号的正确“查找”表包括品牌和型号;您可以从{福特 F-150、雪佛兰科迈罗、凯迪拉克 CTS、沃克斯豪尔雅特、欧宝雅特}中选择。 (在这种情况下,情况甚至更进一步。请参阅下面的表 model_years。)
除非其行首先存在于 model_years 中,否则任何引擎都无法进入该表。除非年份的行首先存在于模型中,否则任何年份都不能进入 model_years 中。除非该行首先存在于品牌中,否则任何行都不能进入模型。
您可以在这样的模式中使用
ON UPDATE CASCADE
。您还可以为不使用它提供一个很好的理由。 Oracle 不支持 ON UPDATE CASCADE ,这是您看到 Oracle 表中充斥着 ID 号的原因之一,也是您有时会看到人们说“主键值绝不能更改。”的原因之一。 ”这些是您实现已知需求所需的表格类型。
Chevrolet doesn't make a Mustang. Ford didn't make a Mustang in 1960. Your structure will allow a lot of nonsense.
The issue isn't that every column is a foreign key; there's nothing necessarily wrong with that. The issue is that the foreign keys are wrong.
And that's why they're wrong. When you normalize a relation, you start with the relation, identify the candidate keys, and work out the functional dependencies. Just making single-column "lookup" tables for every column isn't normalization, and it doesn't constrain your data in the way that's needed. (And in this particular case, the constraints are the missing piece, not normalization to 5NF.)
It should be clear that the only candidate key is {Make, Model, Yr, Engine}. So this table is all key, and it has no non-prime attributes.
To add "lookup" tables as constraints on data, it's not good enough to say that in the first column you have to choose from {Ford, Chevrolet, Cadillac, Vauxhall, Opel}, and in the second column you have to choose from {F-150, Camaro, CTS, Astra}. The right "lookup" table for make and model includes both make and model; you choose from {Ford F-150, Chevrolet Camaro, Cadillac CTS, Vauxhall Astra, Opel Astra}. (In this case, it goes even further. See the table model_years below.)
No engine can go in this table unless its row first exists in model_years. No year can go in model_years unless its row first exists in models. And no row can go in models unless its row first exists in makes.
You can make a good case for using
ON UPDATE CASCADE
in a schema like this. You can also make a good case for not using it. Oracle doesn't supportON UPDATE CASCADE
, which is one reason you see ID numbers peppering Oracle tables, and why you sometimes see people say "Primary key values must never change."These are the kinds of tables you need to implement your known requirements.
PK/FK 关系是一种非常强大的关系,它为您在数据库中提供了机会。话虽这么说,但它并不总是合适的。这很大程度上取决于数据库以及您将如何使用它。上述数据的单个表将有助于更快的数据访问和更轻松的报告,但它会牺牲可扩展性和单实例数据存储。
对于您的具体情况,我建议将年份放回主表中。由于年份是整数,因此重复并不是什么大问题。我还想说,您可能应该将模型表链接到品牌表,因为品牌已连接到模型。
The PK/FK relationship is a really powerful one that opens up opportunities for you in your database. With that being said, it isn't always appropriate. A lot of that depends on the database and how you will use it. A single table for the above data will help with faster data access and easier reporting but it sacrifices scalability and single-instance data storage.
For your particular case, I would suggest putting the years back in the main table. Since the year is an int, duplicating isn't a big deal. I would also say that you should probably link your models table to your makes table since the makes are connected to the models.
乍一看,我认为所有这些表格都是毫无意义的。也许
MakesModelsAndYears
就是您所需要的。然后我会重新考虑它的名字。至少我会放弃“和”。充其量我会将其重命名为“汽车”表键和数据关系
不一定是同一件事。主键唯一标识给定表的行。就这样。外键是给定值存在于其他表中的“保证”。数据可以通过正式定义的键以外的方式关联。我们有时称这些为候选键。哦,没有法律规定任何给定的表中都必须有主键。
我曾经使用过数据库,经常在主键和/或外键之外的其他表上连接表。这就是真正的数据关系的运作方式。
数据规范化
作为一般规则,您希望避免表之间的数据冗余。但是,如果您的
Years
表行只有一列 - “year”,那么它的意义何在(其他表也同样如此)?您实质上是在指向它的MakeModelsAndYears
中复制该数据。如果您确实保留了
Model
、Engine
、Year
、Makes
表,那么这些表就不会变得愚蠢。 ,在Makes
表中创建“ID”列(所有表同上)是一个愚蠢的错误,这样您就不必在MakeModelAndYears< 中存储“Chevrolet” /代码>表。想象一下,看着该表,您看到的只是一行又一行的数字!要显示有意义的信息,您必须进行大量连接 - 只需说“1960 Chevy 454 Hemi Impala”。现在这样效率很低!
索引
您创建的索引取决于您查找数据的方式以及查找数据时数据库的性能。
特别为那些您可能要加入或搜索但不是正式键的列创建索引。
此外,多列索引非常适合您经常一起搜索的那些列。索引列的顺序很重要。当我所做的只是反转某个索引中的列顺序时,我发现了非常显着的差异。您的里程可能会有所不同(双关语)。是的,同一列上不同顺序的索引可能有意义,具体取决于表大小、连接列顺序、查找频率。如果/当您遇到性能问题时,您只需要查看这些内容即可。
At first blush I see all those tables as pointless. Perhaps
MakesModelsAndYears
is all you need. And then I'd reconsider its name. At the very least I'd drop the "And". At best I'd rename it "Cars"Table Keys and Data Relationships
Not necessarily the same thing. A primary key uniquely identifies the rows of a given table. That's all. A foreign key is a "guarantee" that a given value exists on some other table. Data can be related by other than formally defined keys. We sometimes call these candidate keys. Oh, there is no law that says you must have a primary key in any given table.
I've worked with databases where I often joined tables on other than primary and/or foreign keys. That just how the real data relationships worked out.
Data Normalization
As a general rule you want to avoid data redundancy across tables. However if your
Years
table row has only one column - "year" then what's the point of that (ditto for those other tables as well)? You essentially duplicate that data inMakeModelsAndYears
pointing to it.And if you do keep
Model
,Engine
,Year
,Makes
tables don't make the stupid, boneheaded mistake of creating "ID" column in yourMakes
table (ditto for all the tables) just so you don't have to store "Chevrolet" in theMakeModelAndYears
table. Imagine looking at that table and all you see are numbers in row after row!! To display meaningful information you'll have to make lots of Joins - just to say "1960 Chevy 454 Hemi Impala". Now that is inefficient!Indexes
What indexes you make is driven by how you'll be looking up data and the database's performance when doing so.
Make indexes in particular for those columns that you may be joining or searching on but are not formal keys.
Also multi-column indexes are great for those columns that you frequently search for together. And the index column order matters. I've seen a very significant difference when all I did was reverse the column order in a certain index. Your mileage may vary (pun intended). And yes, differently ordered indexes on the same columns may make sense depending on table sizes, join column order, lookup frequency. You'll just have to look at these things if/when you have performance issues.