星型模式[事实 1:n 维度]...如何?
我是数据仓库的新手,我希望有一个关于构建星型模式的简单问题:
如果我有一个事实表,其中事实记录自然与单个维度具有一对多关系,那么星型模式如何模式可以建模来支持这一点吗?例如:
- 事实表:销售点条目( 衡量标准是 DollarAmount)
- 维度表:促销(这些 当以下情况时,促销是否有效 销售已完成)
情况是我希望单个销售点条目与多个不同的促销活动相关联。这些促销活动不能是它们自己的维度,因为有很多很多促销活动。
我该怎么做?
I am a newcomer to data warehouses and have what I hope is an easy question about building a star schema:
If I have a fact table where a fact record naturally has a one-to-many relationship with a single dimension, how can a star schema be modeled to support this? For example:
- Fact Table: Point of Sale entry (the
measurement is DollarAmount) - Dimension Table: Promotions (these
are sales promotions in effect when a
sale was made)
The situation is that I want a single Point Of Sale entry to be associated with multiple different Promotions. These Promotions cannot be their own dimensions as there are many many many promotions.
How do I do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于真正具有“多值”维度的情况,桥接表通常是 Kimball 推荐的解决方案。
您的“促销”维度只是每次促销的记录及其属性(开始日期、结束日期、优惠券代码、POS 促销代码、广告名称等)。此处未对促销与产品的关系进行建模,因为它将反映在事实表中。
促销/折扣维度将如下所示(每个唯一计划的促销 1 行)
您的销售事实将如下所示:
您的“促销组”桥接表将是一组组合:
如果发生的促销有 3 项促销,您只需创建与每个促销相关的组 ID,然后将组 ID 放入事实表中。这与医疗报告系统处理多种诊断的方式非常相似。
请注意,通过使用 Bridge 表,您可以轻松地重复计算销售额,因此我建议由了解该模型的人员来开发使用此方法的报告。
For cases when you truly have a "multi-valued" dimension, a Bridge Table is usually the solution that Kimball recommends.
Your "Promotion" dimension simply is a record of each promotion, with its attributes (start date, end date, coupon code, POS promo code, Ad Name, etc). The relationship from promo to product isn't modeled here, since it will be reflected in the fact table.
Promotion/Discount Dimension would look like (1 row per unique planned promotion)
Your Sales Fact would look like:
Your "Promotion Group" bridge table would then be the set of combinations:
If a sale occurs that has 3 promotions on it, you simply create group ID that relates to each promo, then put the group ID on the fact table. It's very similar to the way that medical reporting systems deal with multiple diagnoses.
Note that by using a Bridge table, you can easily double count sales, so I advise that reports using this method be developed by folks that understand the model.
时间几乎总是星型模式中的一个维度。
“实际上”表示促销活动有开始日期和结束日期。
因此,促销本身可能是一个事实,具有对时间维度的开始和结束日期引用。
也许使用这样的模型,您可以拥有一个 JOIN 表,在事实之间以多对多的方式将销售与促销关联起来。
“很多很多”促销——是的,但是有多大呢?每天一条意味着每年 365 条记录。我假设促销以某种方式与产品或类别相关联。一次销售将有一个时间戳和多个产品。
您必须将它们存储在某个地方、某个时间,否则您的模型就会崩溃。为什么不愿意以这种方式进行促销模式?
我的建议是不要担心数据的大小,而专注于尽可能地对问题进行建模。首先建立正确的逻辑模型,然后再考虑物理模型和数据大小。
Time is almost always a dimension in a star schema.
"In effect" suggests that there is a start and end date for a Promotion.
So a Promotion might itself be a fact that has a start and end date reference to the Time dimension.
Maybe with a model like this you could have a JOIN table to relate Sale to Promotion in a many-to-many fashion between facts.
"Many, many" Promotions - yes, but how large is that? One per day means 365 records per year. I'll assume that Promotions are associated somehow with Products or Categories. A Sale would have a timestamp and multiple Products.
You have to store them somewhere, sometime or your model falls apart. Why the reluctance to model Promotion that way?
My advice would be to not worry about the size of the data and concentrate on modeling the problem as best you can. Get the logical model right first, then worry about the physical model and the data sizes.
您应该为每次促销加载事实记录,即使金额相同。事实上,如果示例中的每种促销类型都真正由该特定美元金额表示,则应使用促销类型的键加载事实记录,还包含返回其他相关维度(包括日期)的键。
这里的要点是不用担心数据重复。考虑一个面向销售的数据仓库,例如一家快餐公司。人们可以假设不会只有一个 4.13 美元的事实记录,该记录用于代表“超值餐#3”的一百万份不同销售量。相反,“交易”维度中的每条记录都与该假设的销售事实表中的至少一个特定事实记录有关系。
You should load a fact record for each promotion, even if the dollar amount is the same. If in fact, each type of promotion in your example is truly represented by this specific dollar amount, then a fact record should be loaded with the key of the promotion type, also containing keys back to other related dimensions (including Date).
The main point here is don't worry about data duplication. Think about a sales-oriented Data Warehouse, for say, a fast food company. One can assume there won't be just one fact record for $4.13, which is used to represent a million distinct sales of "value meal #3". Instead, each record in the "Transaction" dimension would have a relationship with at least one specific fact record in this hypothetical Sales fact table.