数据库模式相关问题
我有一个关于数据库的理论问题。为了使它更具体,我想出了一个例子。
假设我有一家有产品的商店。我有很多不同的产品。并非每种产品都具有相同的适用属性。例如,我可以定义硬盘的大小(以 GB 为单位),但不能在 CPU 上使用相同的属性,因为它不适用。我想要的是一个可以动态向产品添加属性的数据库。我唯一能想到的是:
一个带有 ID、名称和描述的产品表。
一个属性表,包含 ID、Product_ID、属性和值。
这样我可能会得到一个巨大的、我认为效率不高的属性表。这已经困扰我很长时间了。有谁知道更好的解决方案来解决我的问题?
I have a kind of theoretical question about databases. To make it more concrete I have thought up an example case.
Suppose I have a store with products. I have lots of different products. Not every product has the same applicable properties. For instance, I could define the size of a harddisk in gigabytes, but cannot use that same property on a CPU, simply because it does not apply. What I want is a database where I can add properties to products dynamically. The only thing I can come up with is the following:
One product table with an ID, a Name and a Description.
One properties table with an ID, Product_ID, Property and a Value.
This way I would potentially get a gigantic, I-suppose-not-so-efficient, table of properties. This has been bugging me for a long time now. Does anyone know a better solution to my problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这实际上正在走向第六范式,只是像你这样没有学术或经验背景的人不知道(a)它的名称和(b)规则和注意事项。这些人实现了通常所说的实体-属性-值或 EAV。如果做得正确,那就没问题,并且有成千上万的医疗系统在此类表格中包含诊断和剂量信息。如果不是,那么它的使用和维护就只是一只狗的早餐。
首先确保您拥有真正且完整的 5NF 中的
产品
。始终使用完整的声明性引用完整性;
CHECK
约束和RULES
。切勿将所有内容放入一个带有
VARCHAR()
值的表中。始终使用正确的(适用的)数据类型。这意味着您将拥有多个表,每种数据类型各一个,并且不会失去控制或完整性。同样,任何关联表(其中存在对另一个表的多重引用[例如供应商])必须是单独的。
CHECK
约束和RULE
以确保数据和引用完整性不会丢失。这意味着,例如:ProductDecimal
中的CPUSpeed
列,检查
它是否在正确的值范围内Product
表检查
ProductType-ColumnNo
组合的DataType是否正确。
保留
Product
中的所有必填列;仅将子产品
表用于可选列。对于每个这样的(例如,
Product
)表,您需要创建一个视图(虚线),它将从 EAV/6NF 表构造 5NF 行。您可能有多个视图:Product_CPU
、Product_Disk
。不要通过视图更新。将所有更新保存在存储过程中,并插入或更新每个列(即适用于每个特定的
Product
和sub-Product
表)ProductType
) 一起。巨大?商业数据库(不是免费软件)对于大型表或连接没有问题。这实际上是一个非常有效的结构,并且允许非常快速的搜索,因为表实际上是面向列的(而不是面向行的)。如果人口是巨大的,那么它就是巨大的,你自己算一下。
您还需要一张表,即
Property
(或Attribute)的查找表。这是目录的一部分,并且基于ProductType
更好的解决方案是采用完整、正式的第六范式。如果您只有一个或几个需要可选列的表,则没有必要。
需要明确的是:
第六范式是行由主键和最多一个属性组成。
这是 6NF(至少对于 Product 表簇),然后通过 DataType 再次规范化(不是范式意义上的),以减少表的数量(否则每个属性都会有一个表)。
这保留了完整的 Rdb 控制(FK、约束等);而常见的 EAV 类型不关心 DRI 和控制。
这也有目录的雏形。
产品集群数据模型链接
IDEF1X 符号链接
更新
您可能对此感兴趣▶5NF 6NF讨论◀。我会在某个时候把它写下来。
This is actually moving towards Sixth Normal Form, it is just that people like you who do not have the academic or experiential background do not know the (a) name for it and (b) the rules and the caveats. Such people have implemented what is commonly know as Entity-Attribute-Value or EAV. If it is done properly, it is fine, and there are many thousands of medical system out there carrying diagnostic and dosage info in such tables. If it is not, then it is one dog's breakfast to use and maintain.
First make sure you have
Product
in true and full 5NF.Always use full Declarative Referential Integrity;
CHECK
constraints andRULES
.Never put all that into one table with a
VARCHAR()
for Value. Always use the correct (applicable) DataTypes. That means you will have several tables, one each per DataType, and there is no loss of control or integrity.Likewise any Associative tables (where there is a multiple reference to another table [eg. Vendor] ) must be separate.
CHECK
Constraint andRULE
to ensure that the data and referential Integrity is not lost. That means, eg:CPUSpeed
column, which is stored inProductDecimal
,CHECK
that it is in the proper range of valuesProduct
tableCHECK
that the DataType is correct for theProductType-ColumnNo
combination.
Keep all the mandatory columns in
Product
; use thesub-Product
tables for optional columns only.For each such (eg
Product
) table, you need to create a View (dotted line), which will construct the 5NF rows from the EAV/6NF tables. You may have several Views:Product_CPU
,Product_Disk
.Do not update via the View. Keep all your updates transactional, in a stored proc, and insert or update each of the columns (ie. the
Product
andsub-Product
tables which are applicable, for each particularProductType
) together.Gigantic ? Commercial databases (not the freeware) have no problems with large tables or joins. This is actually a very efficient structure, and allows very fast searches, because the tables are in fact column-oriented (not row-oriented). If the population is gigantic, then it is gigantic, do your own arithmetic.
You need one more table, a Lookup table for
Property
(or Attribute). That is part of the catalogue, and based onProductType
The better solution is to go for full, formal Sixth Normal Form. Not necessary if you have only one or a few tables that require optional columns.
To be clear:
Sixth Normal Form is The Row consists of the Primary Key and, at most, one Attribute.
This is 6NF (for at least the Product table cluster), then Normalised again (Not in the Normal Form sense) by DataType, to reduce the no of tables (otherwise you would have one table per Attribute).
This retains full Rdb control (FKs, constraints, etc); whereas the common EAV types don't bother with the DRI and control.
This also has the rudiments of a catalogue.
Link to Product Cluster Data Model
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Update
You might be interested in this ▶5NF 6NF Discussion◀. I will write it up at some point.
最初,我建议您使用一个
productproperty
表来对产品和属性之间的关系进行建模。这将允许您将许多产品与特定属性相关联。然而,我并不热衷于将值以 1:1 的方式与每个属性一起存储。如果您有一个将属性与值关联起来的
propertyvalue
表,这可能是最好的。然后,您将放弃productproperty
表,转而使用更丰富的productpropertyvalue
表,该表可以完整描述产品、其属性及其值之间的关系。也许您可以得到以下结果:
当然,属性值可以是复杂的而不是简单的字符串或整数,但希望这能带您走向正确的方向。
Initially I'd have suggested you have a
productproperty
table to model the relationship between products and properties. This would allow you to associate many products with a particular property.However, I'm not keen on the idea of having a value stored alongside each property as a 1:1. It might be best if you have a
propertyvalue
table that associates a property with a value. Then, you'd ditch theproductproperty
table in favour of having a richerproductpropertyvalue
table which could fully describe the relationship between a product, it's properties and their values.Perhaps then you could have the following:
Of course, property values could be complex rather than simple strings or integers, but hopefully this takes you in the right direction.