Ralph Kimball 的数据仓库工具包书籍 - 订单生命周期集市设计
我正在阅读 Ralph Kimball 的有关数据仓库和维度建模的书。我正在阅读其中一个案例研究,它是关于订单系统的维度建模,其中要求是捕获订单生命周期,从订单到履行再到发货。
因此,我在想,也许他们会建议在交易维度上设置多条交易类型为 FK 的行。然而,本书建议创建“角色扮演”维度 - 创建多个日期维度表(一个用于订单日期,一个用于履行,一个用于发货)。然后,它们中的每一个都将在事实表中拥有一个外键,因此事实表将具有三列来关联这一点。
这不是一种限制吗?每笔交易的行数不是更好的选择吗?
I'm reading Ralph Kimball's book on Data warehouse and Dimension Modeling. I am reading one of the case studies, and it is about dimension modeling for an order system, where the requirement is to capture an order lifecycle, from order to fulfillment to shipped.
So, I was thinking that maybe they would suggest to have multiple lines with a transaction type FK to a transaction dimension. However, the book suggests instead to create 'role-playing' dimensions - create multiple date dimensions tables (one for order date, one for fulfillment, and one for shipped). Each one of them would then have a foreign key into the fact table, and therefore the fact table would have three columns to relate this.
Isn't this kind of restricting? Wouldn't a line-per-transaction be a better choice?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
设计通常涉及权衡,如果没有整个系统的大量细节,就很难知道哪种设计是最好的。
但我对此的看法:书中的表格具有三个独立的列,可能会加快查询速度。数据仓库通常会像这样进行非规范化以提高查询性能,但代价是输入的简单性和多功能性。
对我来说这似乎是一个很好的答案:每笔交易的行对于存储日常交易数据的数据捕获表来说听起来更好,但对于分析来说就不那么好了。
Design often involves trade offs, and it's hard to know what design is best without a lot of details on the entire system.
But my take on this: the table from the book with three separate columns, would likely speed up queries. Data warehouses are often denormalized like this to increase query performance, at the expense of simplicity and versatility of input.
Seems like a good answer to me: your line per transaction sounds better for the data capture tables that store the day to day transactional data, but not as great for analysis.