SQL:数据库规范化,同时保留约束
假设我有下表:
____________________ ____________________
| Organisms | | Species |
|--------------------| |--------------------|
|OrganismId (int, PK)| |SpeciesId (int, PK) |
|SpeciesId (int, FK) |∞---------1|Name (varchar) |
|Name (varchar) | |____________________|
|____________________| 1
1 |
| |
| |
∞ ∞
______________________ ____________________ _______________
| OrganismPropsValues | | SpeciesProps | | Props |
|----------------------| |--------------------| |---------------|
|OrganismId (int, FK) | |PropId (int,PK,FK) | ∞-----1|PropId (int,PK)|
|PropId (int, FK) | |SpeciesId(int,PK,FK)| |Name (varchar) |
|Value (varchar) | |____________________| |_______________|
|______________________| 1
∞ |
| |
-----------------------------------------------------------
对我在这里试图表示的内容的快速解释:假设我们有一个物种列表,例如猫、狗、人类等。我们还有一组属性(缩写为 Props,因此我可以更容易在图中拟合),这适用于某些但不一定是所有物种 - 例如,这可能是尾巴长度(对于有尾巴的物种)、眼睛颜色(对于有眼睛的物种)等。SpeciesProps
是一个链接器表定义哪些属性适用于哪些物种 - 所以这里我们会有 {人类,眼睛颜色},{狗,眼睛颜色},{猫,眼睛颜色},{狗,尾巴长度},{猫,尾巴长度}。我们没有 {Human, Tail Length},因为 Tail Length 显然不是适用于人类的有效属性。
有机体表保存了物种的实际“实现”——所以这里我们可能有{人类,鲍勃},{狗,鲁弗斯}和{猫,菲利克斯}。
现在我的问题是:在 OrganismPropsValues 表中,我想存储每个生物体的属性“值” - 例如,对于鲍勃,我想存储 {鲍勃,眼睛颜色,蓝色}。对于 Rufus,我想存储 {Rufus,Eye Color,Brown} 和 {Rufus,Tail Length,20}(与 Felix 类似)。然而,我的问题是,在我详细说明的模式中,完全可以存储 {Bob, Tail Length, 10},即使 SpeciesProps 中不存在 {Human, Tail Length} 元组。如何修改此模式,以便可以强制执行 OrganismPropsValues 中 SpeciesProps 中定义的约束,同时保持足够的标准化?
Suppose I have the following tables:
____________________ ____________________
| Organisms | | Species |
|--------------------| |--------------------|
|OrganismId (int, PK)| |SpeciesId (int, PK) |
|SpeciesId (int, FK) |∞---------1|Name (varchar) |
|Name (varchar) | |____________________|
|____________________| 1
1 |
| |
| |
∞ ∞
______________________ ____________________ _______________
| OrganismPropsValues | | SpeciesProps | | Props |
|----------------------| |--------------------| |---------------|
|OrganismId (int, FK) | |PropId (int,PK,FK) | ∞-----1|PropId (int,PK)|
|PropId (int, FK) | |SpeciesId(int,PK,FK)| |Name (varchar) |
|Value (varchar) | |____________________| |_______________|
|______________________| 1
∞ |
| |
-----------------------------------------------------------
A quick explanation of what I am trying to represent here: suppose we have a list of species, such as cat, dog, human, etc. We also have a set of properties (abbreviated Props so I could fit it more easily in the diagram) which apply to some but not necessarily all species--for example, this may be tail length (for species with tails), eye color (for those with eyes), etc.
SpeciesProps is a linker table that defines which properties apply to which species-- so here we would have {Human, Eye Color}, {Dog, Eye Color}, {Cat, Eye Color}, {Dog, Tail Length}, {Cat, Tail Length}. We do not have {Human, Tail Length} because Tail Length is obviously not a valid property to apply to a human.
The Organisms table holds actual "implementations" of the species-- So here we might have {Human, Bob}, {Dog, Rufus}, and {Cat, Felix}.
Here is now my issue: in the OrganismPropsValues table, I want to store the 'values' of the properties for each organism--so for example, for Bob I want to store {Bob, Eye Color, Blue}. For Rufus, I would want to store {Rufus, Eye Color, Brown} and {Rufus, Tail Length, 20} (similar for Felix). My problem however, is that in the schema that I have detailed, it is perfectly possible to store {Bob, Tail Length, 10}, even though the {Human, Tail Length} tuple does not exist in SpeciesProps. How can I modify this schema so I can enforce the constraints defined in SpeciesProps in OrganismPropsValues, while maintaining adequate normalization?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您正在实现 Entity-Attribute-Value 反模式。这不可能是标准化的数据库设计,因为它不是关系型的。
我建议采用类表继承设计模式:
为以下对象创建一个表 :生物体,包含所有物种共有的特性。
为每个物种创建一个表,其中包含特定于该物种的属性。这些表中的每一个都与有机体具有一对一的关系,但每个属性都属于自己的列。
<前><代码> ____________________ ____________________
|生物| |物种 |
|--------------------| |--------------------|
|OrganismId(int,PK)| |SpeciesId (int, PK) |
|SpeciesId (int, FK) |∞---------1|名称 (varchar) |
|名称(varchar)| |____________________|
|____________________|
1
|
|
1
______________________
|人类有机体 |
|----------------------|
|OrganismId (int, FK) |
|性别(枚举)|
|种族(int,FK)|
|EyeColor(int,FK)|
|.... |
|______________________|
这确实意味着您将创建许多表,但请将此视为以正确的关系方式存储属性的许多实际好处的权衡:
有关此设计的更多信息,请参阅 Martin Fowler 的书企业应用程序架构模式,或我的演示< a href="http://www.slideshare.net/billkarwin/practical-object-orient-driven-models-in-sql" rel="nofollow noreferrer">实用的面向对象模型SQL,或者我的书,SQL 反模式卷 1:避免数据库编程的陷阱。
You're implementing the Entity-Attribute-Value antipattern. This can't be a normalized database design, because it's not relational.
What I would suggest instead is the Class Table Inheritance design pattern:
Create one table for Organisms, containing properties common to all species.
Create one table per species, containing properties specific to that species. Each of these tables has a 1-to-1 relationship with Organisms, but each property belongs in its own column.
This does mean you will create many tables, but consider this as a tradeoff with the many practical benefits to storing properties in a relationally correct way:
For more on this design, see Martin Fowler's book Patterns of Enterprise Application Architecture, or my presentation Practical Object-Oriented Models in SQL, or my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
嗯...
这是一种方法:
将 SpeciesPropsId 添加到 SpeciesProps 表中。
将 OrganismPropsValues 表中的 PropId 替换为 SpeciesPropsId。
您需要稍微更改一下约束。
需要将 SpeciesProps 添加到 OrganismPropsValues 约束中。
需要删除 OrganismPropsValues 以进行 Props 约束。
从技术上讲,您不必从 OrganismPropsValues 中删除 PropId,但如果保留它,则会导致数据冗余。
Hmm...
Here is one way to do it:
Add SpeciesPropsId into SpeciesProps table.
Replace PropId with SpeciesPropsId in the OrganismPropsValues table.
You will need to change constrains a bit.
Need to add SpeciesProps to OrganismPropsValues constrain.
Need to remove OrganismPropsValues to Props constrain.
Technically you do not have to remove PropId from OrganismPropsValues, but if you keep it it will make data redundat.
每当您有这样的菱形依赖项时,请考虑更加重视复合主键。
具体来说,不仅通过
OrganismId
来识别生物体,还可以通过SpeciesId
和OrganismSubId
的组合来识别生物体(您仍然可以使用OrganismId
code>,但将其保留为备用键 - 为简洁起见,此处未显示)。一旦你这样做了,你的模型就可以看起来像这样:
这里要注意的关键是 < code>SpeciesId 沿着该菱形图的两条边“传播”。这就是为您提供了所需的限制,即无法为未针对给定物种“声明”的属性“分配值”。
顺便说一句,在命名表时使用单数。另外,考虑使用自然主键(例如
SpeciesName
而不是SpeciesId
作为 PK) - 如果做得正确,它可以显着提高 JOIN 的速度(尤其是与集群结合使用) .Whenever you have a diamond-shaped dependency like this, consider putting more emphasis on composite PRIMARY KEYS.
Specifically, identify the Organism not just by
OrganismId
, but by the combination ofSpeciesId
andOrganismSubId
(you can still haveOrganismId
, but keep it as an alternate key - not show here for brevity).Once you do that, your model can be made to look like this:
The key thing to note here is that
SpeciesId
is "propagated" down both edges of this diamond-shaped graph. This is what gives you the desired restriction of not being able "assign a value" to a property that was not "declared" for the given species.BTW, use singular when naming your tables. Also, consider using natural primary keys (e.g.
SpeciesName
instead ofSpeciesId
as PK) - if done right it can significantly increase the speed of your JOINs (especially in conjunction with clustering).实现这些约束的另一种方法是通过删除
OrganismId
并添加No
来更改Organism
表的 PK。然后对化合物(SpeciesId, No)
进行 PK。因此,“Bob”
将是(Human, 1)
,“Rufus”
将是(Dog, 1)
> 等。然后,在
OrganismPropsValues
表中添加SpeciesId
和No
(删除OrganismId
)。 )这将允许将 FK 从
OrganismPropsValues
更改为Props
以引用SpeciesProps
:Another way to achieve these constraints would be to change the PK of
Organism
table by droppingOrganismId
and adding aNo
. Then make PK the compound(SpeciesId, No)
. So,"Bob"
would be(Human, 1)
,"Rufus"
would be(Dog, 1)
, etc.Then, add in the
OrganismPropsValues
table, theSpeciesId
and theNo
(removing theOrganismId
.)This will allow to change the FK from
OrganismPropsValues
toProps
to referenceSpeciesProps
instead: