星型架构设计 - 一列维度
我是数据仓库的新手,但我认为我的问题可以相对容易回答。 我构建了一个星型模式,其中包含维度表“产品”。该表有一列“PropertyName”和一列“PropertyValue”。 因此,尺寸看起来有点像这样:
surrogate_key | natural_key (productID) | PropertyName | PropertyValue | ...
1 5 Size 20 ...
2 5 Color red
3 6 Size 20
4 6 Material wood
等等。
在我的事实表中,我总是使用维度的代理键。由于 PropertyName 和 PropertyValue 列,我的自然键不再唯一/无法识别,因此我的事实表中的行太多。
我现在的问题是,我应该如何处理属性列?将每个属性放入单独的维度(例如维度大小、维度颜色等)是否最好?我得到了大约 30 个不同的属性。 或者我应该为事实表中的每个属性创建列吗? 或者用所有属性创建一维?
预先感谢您的任何帮助。
I`m new to data warehousing, but I think my question can be relatively easy answered.
I built a star schema, with a dimension table 'product'. This table has a column 'PropertyName' and a column 'PropertyValue'.
The dimension therefore looks a little like this:
surrogate_key | natural_key (productID) | PropertyName | PropertyValue | ...
1 5 Size 20 ...
2 5 Color red
3 6 Size 20
4 6 Material wood
and so on.
In my fact table I always use the surrogate keys of the dimensions. Cause of the PropertyName and PropertyValue columns my natural key isn`t unique / identifying anymore, so I get way too much rows in my fact table.
My question now is, what should I do with the property columns? Would it be best, to put each property into separate dimensions, like dimension size, dimension color and so on? I got about 30 different properties.
Or shall I create columns for each property in the fact table?
Or make one dimension with all properties?
Thanks in advance for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的维度表“产品”应如下所示:
如果您有许多属性,请尝试将它们分组到另一个维度中。例如,如果您可以拥有具有相同 ID 和相同价格的其他颜色或材料的相同产品,则颜色和材料可以是另一个维度的属性。你的事实表可以用两个键来识别产品:product_id 和 colormaterial_id...
阅读推荐:
数据仓库工具包,Ralph Kimball
Your dimension table 'product' should look like this:
If you have to many properties, try to group them in another dimension. For example Color and Material can be attributes of another dimension if you can have the same product with same id and same price in another color or material. Your fact table can identify product with two keys: product_id and colormaterial_id...
Reading recommendation:
The Data Warehouse Toolkit, Ralph Kimball
您的设计称为
EAV
(实体属性值)表。对于稀疏矩阵(大量属性,但同时填充的属性很少)来说,这是一个很好的设计。
然而,它有几个缺点。
它无法同时在两个或多个属性上建立索引(因此可以有效地搜索)。像这样的查询:“获取所有由木材制成且尺寸为 20 的产品”的效率会较低。
同时实现涉及多个属性的约束更加复杂
等等
如果这对您来说不是问题,您可以使用<代码>EAV设计。
Your design is called
EAV
(entity-attribute-value) table.It's a nice design for the sparse matrices (large number of properties with only few of them filled at the same time).
However, it has several drawbacks.
It cannot be indexed (and hence efficiently searched) on two or more properties at once. A query like this: "get all products made of wood and having size or 20" will be less efficient.
Implementing constraints involving several attributes at once is more complex
etc.
If it's not a problem for you, you can use
EAV
design.