多维数据集设计 - ROLAP 考虑因素与 MOLAP
有没有人有资源提供设计 ROLAP 多维数据集时要考虑的事项列表,而不是 MOLAP(我在 Pentaho 中进行,但我想这些原理与其他实现并没有什么不同)。例如,我正在考虑这样的事情:
是否应该在 ETL 阶段完成额外的转换工作以减少查询多维数据集时的计算工作?
我的所有维度表是否应该与我的多维数据集位于同一数据库中?
Does anyone have resources that give a list of things to consider when designing a ROLAP cube, as opposed to MOLAP (I'm doing it in Pentaho, but I guess the principles are not dis-similar for other implementations). For example, I'm thinking of things like:
should extra transformational work be done at the ETL stage to reduce computational work when querying the cube?
should all my dimension tables be in the same database as my cube?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我是印度尼西亚的 Pentaho 实现者。首先,您当然应该尝试通过涉及的代理键聚合所有度量组。
在 Mondrian 中,您可以使用附加聚合表“缓存”一些计算。您可以在 Pentaho Aggregate Designer 中完成此操作。但之后您将需要在数据仓库/ETL 阶段进行额外的工作。
问候,
费里斯
http://pentaho-en.phi-integration.com
I'm a Pentaho implementor in Indonesia. First, of course you should try to aggregate all your measures group by surrogate keys involved.
And in Mondrian, you can "cache" some computations using additional aggregate tables. You can do it in Pentaho Aggregate Designer. But after that you will need additional work in your data warehouse / ETL stage.
Regards,
Feris
http://pentaho-en.phi-integration.com
首先 - 设计相似,但它们是由不同的性能和性能驱动的。可扩展性策略。
其次 - etl 过程几乎相同。除了 - 由于关系数据库中的可扩展性功能,您通常会在 Rolap 多维数据集中看到比 molap 多维数据集更多的数据。您经常会在非 rolap 数据库(仓库,甚至事务数据库)中看到一个 rolap 多维数据集,它不仅仅支持 rolap。
最后,如果数据量很大,您通常会生成聚合表。这种聚合可以通过多种不同的方式完成,但我想说,它通常不是由 ETL 流程驱动的,除非您缺乏管理单独异步流程的能力,或者拥有的数据量使得运行期间汇总作业变得不切实际。
First off - the designs are similar but they are driven by different performance & scalability strategies.
Secondly - the etl process is pretty much the same. Except - you'll typically see a lot more data in a rolap cube than a molap cube because of scalability features in relational databases. And you'll often see a rolap cube within a non-rolap database (warehouse, even transactional database) that does more than just support rolap.
Lastly, you'll typically generate aggregate table if you've got much data volume. That aggregation can be done a lot of different ways, but I'd say it is not typically driven by your ETL process unless you lack the ability to manage a separate asychronous process or have data volumes that make it impractical to run period summary jobs.
感谢 Feris 提供的链接和输入,但最终我还是选择了这本书:
http ://www.amazon.com/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322/ref=sr_1_1?ie=UTF8&s=books&qid=1258408259&sr=8-1
我有仔细阅读了 Mondrian 网站+文档,但这本书似乎更全面。
Thanks to Feris for the link and input, but in the end I went for this book:
http://www.amazon.com/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322/ref=sr_1_1?ie=UTF8&s=books&qid=1258408259&sr=8-1
I had a good long look at the Mondrian site + docs, but the book seems more comprehensive.