如何在通用方案上对数据变量方差进行建模? SQL

发布于 2024-07-15 03:27:02 字数 2207 浏览 7 评论 0原文

我最近在思考一些事情,我想知道什么是正确的方法来做类似以下场景的事情(我确信对于数据库人员来说做类似的事情是很常见的事情)。

假设您有一个产品表,如下所示(MySQL):

CREATE TABLE `products` (
  `id` int(11) NOT NULL auto_increment,
  `product_name` varchar(255) default NULL,
  `product_description` text,
  KEY `id` (`id`),
  KEY `product_name` (`product_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

这里没有什么异常。 现在假设在不同的表中有一个类别层次结构,并且有一个单独的表将多对多关系与产品表绑定在一起 - 这样每个产品都属于某种类别(我将省略这些,因为这不是这里的问题)。

现在是有趣的部分 - 如果每个类别都要求产品项附加一组变量,会发生什么情况。 例如,计算机显示器类别中的产品必须具有 LCD/CRT 枚举字段、屏幕尺寸枚举等 - 以及其他一些类别,例如冰淇淋具有其他一些变量,如风味 varchar、货架存储时间 int 等。

这里的问题在于,所有产品都有一组公共变量(id、名称、描述等),但还有一些附加变量在不同类别之间不一致 - 但所有产品都应该共享公共变量集,因为最后它们都属于产品组,因此可以查询例如 SELECT * FROM products ORDER BY company_id (简单的例子,也许不具有代表性,但你明白了)。

现在,我看到了几个潜在的解决方案:
- 为每个产品类别生成单独的表,并使用适当的附加变量将产品存储在那里 - 愚蠢且不适合查询

- 产品表与公共变量保持相同,并为每个类别创建一个单独的表,其中包含通过 JOIN 绑定两个表的附加变量 - 标准化,但查询性能和清晰度问题 - 如何从类别中过滤产品(第一个表 - 产品)和额外变量的附加过滤器(例如 17" LCD 显示器) - 这将需要 SQL JOIN 技巧

- 产品表保持不变,并添加另一个变量类型文本,其中包含例如包含其他变量的 JSON 数据 - 紧凑而整洁,但无法使用 SQL 过滤变量

我知道我在这里遗漏了一些非常明显和简单的东西 - 我对标准化技术有点生疏:)


edit: I've been searching around stackoverflow before asking this question without success. However, after I've posted the question I have clicked on one of my tags 'normalization' and found several similar questions which resulted in to look up 'generalization specialization relational design'. Point of the story is that this must be the first occurrence in my internet life that tags are actually useful in search. However, I would still like to hear from you guys and your opinions.
edit2: The problem with approach no.2 is that I expect somewhere around ~1000 specializations. There is a hierarchy (1-4 level deep) of categories and end nodes add specialized variables - they accumulate in the order of ~1000, so it would be a bit unpractical to add specialized tables to join with.
edit3: Due to the vast number of attribute volatility in my case "entity attribute value" that was suggested looks like the way to go. Here comes query nightmares! Thanks guys.

I was thinking about some stuff lately and I was wondering what would be the RIGHT way to do something like the following scenario (I'm sure it is a quite common thing for DB guys to do something like it).

Let's say you have a products table, something like this (MySQL):

CREATE TABLE `products` (
  `id` int(11) NOT NULL auto_increment,
  `product_name` varchar(255) default NULL,
  `product_description` text,
  KEY `id` (`id`),
  KEY `product_name` (`product_name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Nothing out of the ordinary here. Now lets say that there are a hierarchy of categories in a different table, and there is a separate table which binds many-to-many relationships with products table - so that each product belongs to some kind of a category (I'll omit those, because thats not the issue here).


Now comes the interesting part - what IF each of the categories mandates additional set of variables to the product items. For example products in the computer monitors category must have LCD/CRT enum field, screen size enum etc. - and some other category, lets say ice creams have some other variables like flavor varchar, shelf storage time int etc.

The problem herein lies in that all products have a common set of variables (id, name, description and sort of like that), but there are additional variables which are not consistent from category to category - but all products should share common set, because in the end they all belong to the products group, so one can query for example SELECT * FROM products ORDER BY company_id (trivial example, maybe not representative, but you get the picture).

Now, I see severa potential resolutions:

- generate separate table for each product category and store products there with appropriate additional variables - stupid and not query friendly

- product table stays the same with common variables, and for each category create a separate table with additional variables binding two tables with a JOIN - normalized, but query performance and clarity issues - how would one filter down products from category (1st table - products) and additional filter for extra variable (17" LCD monitors ie.) - it would require SQL JOIN trickery

- products table stays the same and add another variable type text that holds for example JSON data that hold additional variables - compact and neat, but can't filter through variables with SQL

I know I'm missing something quite obvious and simple here - I'm a bit rusty on the normalization techniques :)


edit: I've been searching around stackoverflow before asking this question without success. However, after I've posted the question I have clicked on one of my tags 'normalization' and found several similar questions which resulted in to look up 'generalization specialization relational design'. Point of the story is that this must be the first occurrence in my internet life that tags are actually useful in search. However, I would still like to hear from you guys and your opinions.


edit2: The problem with approach no.2 is that I expect somewhere around ~1000 specializations. There is a hierarchy (1-4 level deep) of categories and end nodes add specialized variables - they accumulate in the order of ~1000, so it would be a bit unpractical to add specialized tables to join with.


edit3: Due to the vast number of attribute volatility in my case "entity attribute value" that was suggested looks like the way to go. Here comes query nightmares! Thanks guys.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

左耳近心 2024-07-22 03:27:02

您期望有多少种产品类型? 它们各自有自己的应用逻辑吗?

您可以创建一个称为“实体属性值”模型的通用模型,但是当您尝试处理产品的特定属性时,它有很多陷阱。 简单的搜索查询有时会变成真正的噩梦。 基本思想是您有一个表来保存产品 ID、属性名称(或属性表中的 ID)和值。 您还可以添加表格来保存每种产品类型的模板。 因此,一组表格会告诉您对于任何给定产品,它可以具有哪些属性(可能以及有效值范围),另一组表格会告诉您对于任何单个产品,其值是什么。

不过,我强烈警告不要使用此模型,因为在您必须实际实现它之前,它似乎是一个非常巧妙的想法。

如果您的产品类型数量相当有限,我会采用您的第二种解决方案 - 一个具有基本属性的主产品表,然后是每种特定类型产品的附加表。

How many product types do you expect? Do they each have their own application logic?

You can do a generalized model called the "entity attribute value" model, but it has a LOT of pitfalls when you're trying to deal with specific properties of a product. Simple search queries turn into real nightmares at times. The basic idea is that you have a table that holds the product ID, property name (or ID into a properties table), and the value. You can also add in tables to hold templates for each product type. So one set of tables would tell you for any given product what properties it can have (possibly along with valid value ranges) and another set of tables would tell you for any individual product what the values are.

I would caution strongly against using this model though, since it seems like a really slick idea until you have to actually implement it.

If you number of product types is reasonably limited, I'd go with your second solution - one main product table with base attributes and then additional tables for each specific type of product.

兮颜 2024-07-22 03:27:02

我一直在 Oracle 中这样做。

我有以下表格:

t_class (id RAW(16), parent RAW(16)) -- holds class hierachy.
t_property (class RAW(16), property VARCHAR) -- holds class members.
t_declaration (id RAW(16), class RAW(16)) -- hold GUIDs and types of all class instances
t_instance (id RAW(16), class RAW(16), property VARCHAR2(100), textvalue VARCHAR2(200), intvalue INT, doublevalue DOUBLE, datevalue DATE) -- holds 'common' properties

t_class1 (id RAW(16), amount DOUBLE, source RAW(16), destination RAW(16)) -- holds 'fast' properties for class1.
t_class2 (id RAW(16), comment VARCHAR2(200)) -- holds 'fast' properties for class2
--- etc.

RAW(16)Oracle 保存 GUID 的位置

如果您想选择对象的所有属性,您可以发出:

SELECT  i.*
FROM    (
        SELECT  id 
        FROM    t_class
        START WITH
                id = (SELECT class FROM t_declaration WHERE id = :object_id)
        CONNECT BY
                parent = PRIOR id
        ) c
JOIN    property p
ON      p.class = c.id
LEFT JOIN
        t_instance i
ON      i.id = :object_id
        AND i.class = p.class
        AND i.property = p.property

t_property 保存您通常不搜索的内容(例如文本描述等)。

快速属性实际上是数据库中的普通表,以提高查询效率。 它们仅保存某个类或其后代的实例的值。 这是为了避免额外的连接。

您不必使用快速表并将所有数据限制在这四个表中。

对于您的任务,它将如下所示(为了简洁起见,我将使用方括号中的字符串而不是 GUID):

t_class

id             parent

[ClassItem]    [ClassUnknown]
[ClassMonitor] [ClassItem]
[ClassLCD]     [ClassMonitor]

t_property

class          property

[ClassItem]    price
[ClassItem]    vendor
[ClassItem]    model
[ClassMonitor] size
[ClassLCD]     matrixType

t_declaration

id             class
[1]            [ClassLCD] -- Iiyama ProLite E1700

t_instance  -- I'll put all values into one column, disregarding type (INT, VARCHAR etc)

id             class           property         value

[1]            [ClassItem]     price            $300
[1]            [ClassItem]     vendor           Iiyama
[1]            [ClassItem]     model            ProLite E1700s
[1]            [ClassMonitor]  size             17
[1]            [ClassLCD]      matrixType       TFT

如果您需要一些复杂的查询来搜索,例如 size AND matrixType,您可以从 propertyinstance 中删除它们并创建另一个表:

t_lcd (id RAW(16), size INT, matrixType VARCHAR2(200))

id             size            matrixType

[1]            17              TFT

并使用它来连接其他属性而不是 t_declaration 在上面的查询中。

但即使没有快速表,这个模型也是可行的。

I've been doing this in Oracle.

I had the following tables:

t_class (id RAW(16), parent RAW(16)) -- holds class hierachy.
t_property (class RAW(16), property VARCHAR) -- holds class members.
t_declaration (id RAW(16), class RAW(16)) -- hold GUIDs and types of all class instances
t_instance (id RAW(16), class RAW(16), property VARCHAR2(100), textvalue VARCHAR2(200), intvalue INT, doublevalue DOUBLE, datevalue DATE) -- holds 'common' properties

t_class1 (id RAW(16), amount DOUBLE, source RAW(16), destination RAW(16)) -- holds 'fast' properties for class1.
t_class2 (id RAW(16), comment VARCHAR2(200)) -- holds 'fast' properties for class2
--- etc.

RAW(16) is where Oracle holds GUIDs

If you want to select all properties for an object, you issue:

SELECT  i.*
FROM    (
        SELECT  id 
        FROM    t_class
        START WITH
                id = (SELECT class FROM t_declaration WHERE id = :object_id)
        CONNECT BY
                parent = PRIOR id
        ) c
JOIN    property p
ON      p.class = c.id
LEFT JOIN
        t_instance i
ON      i.id = :object_id
        AND i.class = p.class
        AND i.property = p.property

t_property hold stuff you normally don't search on (like, text descriptions etc.)

Fast properties are in fact normal tables you have in the database, to make the queries efficient. They hold values only for the instances of a certain class or its descendants. This is to avoid extra joins.

You don't have to use fast tables and limit all your data to these four tables.

For you task it will look like this (I'll use strings in square brackets instead of GUID's for the sake of brevity):

t_class

id             parent

[ClassItem]    [ClassUnknown]
[ClassMonitor] [ClassItem]
[ClassLCD]     [ClassMonitor]

t_property

class          property

[ClassItem]    price
[ClassItem]    vendor
[ClassItem]    model
[ClassMonitor] size
[ClassLCD]     matrixType

t_declaration

id             class
[1]            [ClassLCD] -- Iiyama ProLite E1700

t_instance  -- I'll put all values into one column, disregarding type (INT, VARCHAR etc)

id             class           property         value

[1]            [ClassItem]     price            $300
[1]            [ClassItem]     vendor           Iiyama
[1]            [ClassItem]     model            ProLite E1700s
[1]            [ClassMonitor]  size             17
[1]            [ClassLCD]      matrixType       TFT

If you need some complex query that searches on, say, size AND matrixType, you may remove them from property and instance and create another table:

t_lcd (id RAW(16), size INT, matrixType VARCHAR2(200))

id             size            matrixType

[1]            17              TFT

and use it to join with other properties instead of t_declaration in the query above.

But this model is viable even without the fast tables.

白云悠悠 2024-07-22 03:27:02

这种模式有一个名字。 这就是所谓的“泛化专业化”。

如果您搜索“泛化专业化建模”,您将获得一些有关如何执行此操作的文章。 其中一些文章倾向于关系建模和 SQL,而另一些则倾向于对象建模。

There is a name for this pattern. It's called "generalization specialization".

If you search on "generalization specialization modeling" you'll get some articles on how to do this. Some of these articles lean towards relational modeling and SQL, while others lean towards object modeling.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文