使用 ActiveRecord 处理数据仓库样式的表?
随着我的 Rails 应用程序的成熟,越来越明显的是它具有强大的数据仓库风格,只缺少一个事实表来使一切变得明确。
最重要的是,我刚刚阅读了 Ruby 最佳实践。
现在我正在尝试找出如何最好地设计事实检索部分...
假设我有以下维度(应用程序中的现有模型):
- 产品(包含资金)
- 资金
- 衡量标准(例如总持有量、平均持有量、平均持有量)曝光)
...以及一个很好的旧通用事实:
- 事实(日期,值,加上我的每个维度的外键可为空列)
我很乐意得到一些建议的某些方面:
- 可能构成什么灵活的检索界面?
- 如果某个维度的事实同时具有 NULL(即全部或不关心)值和 NOT NULL(特定)值,会发生什么情况? 像
:all
这样的伪值? 或者应该应用一些约定? - 如何仅选择维度值的子集? 或者排除一个子集? :仅和:排除?
- 有没有人有创建
named_scope
来处理这个问题的经验? 能够为每个感兴趣的维度链接一个维度是很明显的吸引力,但是如果我们达到 7 或 8 维度,它会变得太笨重吗?
(我知道有一个 acts_as_fact
插件据说以某种形式存在(至少,在 RailsConf 2006 上有一些小讨论),但我找不到任何代码或说明它如何实现已工作。)
版本:Rails、ActiveRecord 2.1.2、Oracle 增强适配器 1.2.0
编辑:我查看了 ActiveWarehouse 并有一些保留: - 主分支自 11 月 8 日以来没有提交,自 1 月 9 日以来根本没有任何活动; - 该教程可以追溯到 2006 年,被承认已经过时,并且对我进行了 404 错误; - 它似乎想要摆脱 ActiveRecord - 我的应用程序的大部分将保留在 AR 中,我认为目前我想要一个 AR 解决方案。
所以我会避开这个,谢谢!
As my Rails app matures, it's becoming increasingly apparent that it has a strong data warehouse flavour, lacking only a facts table to make everything explicit.
On top of that, I just read Chapters 2 (Designing Beautiful APIs) and 3 (Mastering the Dynamic Toolkit) of Ruby Best Practices.
Now I'm trying to figure out how best to design the fact-retrieving part...
Say I have the following dimensions (existing Models in the app):
- Product (contains funds)
- Fund
- Measure (e.g. total holding, average holding, average exposure)
... and a good old general-purpose Fact:
- Fact (date, value, plus a foreign-key NULLable column for each of my dimensions)
Some aspects on which I'd be grateful to get some advice:
- What might constitute a flexible retrieval interface?
- What happens if I have Facts with both NULL (i.e. all, or don't care) and NOT NULL (specific) values for a dimension? A pseudo-value like
:all
? Or should some convention apply? - How to select only a subset of dimension values? Or exclude a subset? :only and :exclude?
- Has anyone had experience with creating
named_scope
s to deal with this? There's the obvious attraction of being able to chain one for each dimension of interest, but does it get too clunky if we get to 7 or 8 dimensions?
(I'm aware that an acts_as_fact
plugin is reputed to exist in some form (at least, there was some small buzz at RailsConf 2006) but I couldn't find any code or description of how it might have worked.)
Versions: Rails, ActiveRecord 2.1.2, Oracle Enhanced Adapter 1.2.0
EDIT: I looked at ActiveWarehouse and have some reservations:
- the main branch hasn't had a commit since Nov-08 and no there's activity at all since Jan-09;
- the tutorial dates to 2006, is admitted to be out of date, and 404s on me;
- it seems to be wanting to get away from ActiveRecord - much of my app will stay in AR and I think at present that I want an AR solution.
So I'll steer clear of that one, thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
NULL 会产生误导,因为它代表没有关联。 我会使用像
-1
这样的值(如果它是一个整数foreign_key,且只有值> 0)。你也可以覆盖 find 函数
我们有一个 4 维的 Olap 数据库,它运行得很好。 我认为如果您为 active_record 实现一些自定义方法,您的应用程序将会很有趣。
我还发现了这个: http://github.com/aeden/activewarehouse/tree/master
NULL would be a bid misleading because it stands for no association. I would use a value like
-1
(if it is an integer foreign_key with only values > 0).you could also overwrite the find function
We have an olap database with 4 dimension and it works nice. I think if you implement some custom methods for active_record, you will have fun with your app.
I also found this: http://github.com/aeden/activewarehouse/tree/master
还有另一个我没有使用过,但看起来不错:
http://github.com/wvanbergen /active_olap/tree/master
http://techblog .floorplanner.com/2008/07/29/active-olap-released/
这是我在谷歌中找到的 SOLR
http://code.google.com/p/kettle-solr-plugin/
there's another which I haven't used, but looks good:
http://github.com/wvanbergen/active_olap/tree/master
http://techblog.floorplanner.com/2008/07/29/active-olap-released/
and this for SOLR which i found in google
http://code.google.com/p/kettle-solr-plugin/
不久前,在我开始关注其他事情之前,我正在考虑使用 ActiveWarehouse ,所以我不能告诉您它的效果如何,但需要将其添加到您的列表中进行检查。 它具有事实、维度和多维数据集生成器以及 ETL 工具包。
I was looking into using ActiveWarehouse a while ago before I got pull of onto other things, so I can't tell you how well it works, but it's something to add to your list to check out. It has generators for facts, dimensions, and cubes as well as an ETL toolkit.
因此,我们有许多复杂程度不同的宝石或插件,但似乎没有一个正在积极开发(或者有一些事情正在发生,但在雷达之下)。
无论如何,我不想构建一个数据仓库,只是为了以星型模式样式实现一两个事实表。 我想要的是关于如何访问这样一个表的想法。
我倾向于放弃多级别的想法,其中跨维度的聚合将在该维度的外键中具有 NULL。 虽然它会减少查询中涉及的行数,但好处很小而且不是免费的:我将拥有更大的事实表和更复杂的代码。
检索看起来可能会使用一组
named_scopes
进行处理,每个维度一个,用于过滤。 或者使用适当构造的哈希的自定义查找器可能会更好。当我构建它时,我会回来提供一些更好的信息......
So we have a number of gems or plugins of varying degrees of complexity, none of which seem to be very actively under development (or there's something going on but it's under the radar).
In any case, I'm not looking to build a data warehouse, just to implement a fact table or two in star schema style. What I was asking for was ideas on how to access such a table.
I'm leaning towards dropping the idea of multiple-levels, where an aggregate across a dimension would have a NULL in the foreign key for that dimension. While it would reduce the number of rows involved in a query the benefit would be small and not free: I'd have a larger fact table and more complex code.
Retrieval looks like it might then be handled with a set of
named_scopes
, one for each dimension, for filtering. Or a custom finder, fed with a suitably-constructed Hash, might be better.When I've built it, I'll come back with some better information...