使用 ActiveRecord 处理数据仓库样式的表?

发布于 2024-07-30 10:20:30 字数 1139 浏览 2 评论 0原文

随着我的 Rails 应用程序的成熟,越来越明显的是它具有强大的数据仓库风格,只缺少一个事实表来使一切变得明确。

最重要的是,我刚刚阅读了 Ruby 最佳实践。

现在我正在尝试找出如何最好地设计事实检索部分...

假设我有以下维度(应用程序中的现有模型):

  • 产品(包含资金)
  • 资金
  • 衡量标准(例如总持有量、平均持有量、平均持有量)曝光)

...以及一个很好的旧通用事实:

  • 事实(日期,值,加上我的每个维度的外键可为空列)

我很乐意得到一些建议的某些方面:

  • 可能构成什么灵活的检索界面?
  • 如果某个维度的事实同时具有 NULL(即全部或不关心)值和 NOT NULL(特定)值,会发生什么情况? 像 :all 这样的伪值? 或者应该应用一些约定?
  • 如何仅选择维度值的子集? 或者排除一个子集? :仅和:排除?
  • 有没有人有创建 named_scope 来处理这个问题的经验? 能够为每个感兴趣的维度链接一个维度是很明显的吸引力,但是如果我们达到 7 或 8 维度,它会变得太笨重吗?

(我知道有一个 acts_as_fact 插件据说以某种形式存在(至少,在 RailsConf 2006 上有一些小讨论),但我找不到任何代码或说明它如何实现已工作。)

版本:Rails、ActiveRecord 2.1.2、Oracle 增强适配器 1.2.0

编辑:我查看了 ActiveWarehouse 并有一些保留: - 主分支自 11 月 8 日以来没有提交,自 1 月 9 日以来根本没有任何活动; - 该教程可以追溯到 2006 年,被承认已经过时,并且对我进行了 404 错误; - 它似乎想要摆脱 ActiveRecord - 我的应用程序的大部分将保留在 AR 中,我认为目前我想要一个 AR 解决方案。

所以我会避开这个,谢谢!

As my Rails app matures, it's becoming increasingly apparent that it has a strong data warehouse flavour, lacking only a facts table to make everything explicit.

On top of that, I just read Chapters 2 (Designing Beautiful APIs) and 3 (Mastering the Dynamic Toolkit) of Ruby Best Practices.

Now I'm trying to figure out how best to design the fact-retrieving part...

Say I have the following dimensions (existing Models in the app):

  • Product (contains funds)
  • Fund
  • Measure (e.g. total holding, average holding, average exposure)

... and a good old general-purpose Fact:

  • Fact (date, value, plus a foreign-key NULLable column for each of my dimensions)

Some aspects on which I'd be grateful to get some advice:

  • What might constitute a flexible retrieval interface?
  • What happens if I have Facts with both NULL (i.e. all, or don't care) and NOT NULL (specific) values for a dimension? A pseudo-value like :all? Or should some convention apply?
  • How to select only a subset of dimension values? Or exclude a subset? :only and :exclude?
  • Has anyone had experience with creating named_scopes to deal with this? There's the obvious attraction of being able to chain one for each dimension of interest, but does it get too clunky if we get to 7 or 8 dimensions?

(I'm aware that an acts_as_fact plugin is reputed to exist in some form (at least, there was some small buzz at RailsConf 2006) but I couldn't find any code or description of how it might have worked.)

Versions: Rails, ActiveRecord 2.1.2, Oracle Enhanced Adapter 1.2.0

EDIT: I looked at ActiveWarehouse and have some reservations:
- the main branch hasn't had a commit since Nov-08 and no there's activity at all since Jan-09;
- the tutorial dates to 2006, is admitted to be out of date, and 404s on me;
- it seems to be wanting to get away from ActiveRecord - much of my app will stay in AR and I think at present that I want an AR solution.

So I'll steer clear of that one, thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

百合的盛世恋 2024-08-06 10:20:30

如果我两者都有事实会发生什么
NULL(即全部,或不关心)和 NOT
a 的 NULL(特定)值
方面? 像 :all? 这样的伪值
或者应该应用一些约定?

NULL 会产生误导,因为它代表没有关联。 我会使用像 -1 这样的值(如果它是一个整数foreign_key,且只有值> 0)。

如何仅选择一个子集
尺寸值? 或者排除一个子集?

with_scope()

你也可以覆盖 find 函数

   def self.find(*args)
    if  anything
      with_scope(a_scope) do
         result = super *args
      end
    else
      result = super *args
    end
   end

   def self.a_scope
    {:find => { :conditions => ["person_id  = ?", me] , :readonly => true}}
   end

有没有人有过这样的经历
创建named_scopes来处理
这? 有明显的吸引力
能够为每个人链接一个
感兴趣的维度,但它得到
如果我们达到 7 或 8 就太笨重了
尺寸?

我们有一个 4 维的 Olap 数据库,它运行得很好。 我认为如果您为 active_record 实现一些自定义方法,您的应用程序将会很有趣。

我还发现了这个: http://github.com/aeden/activewarehouse/tree/master

What happens if I have Facts with both
NULL (i.e. all, or don't care) and NOT
NULL (specific) values for a
dimension? A pseudo-value like :all?
Or should some convention apply?

NULL would be a bid misleading because it stands for no association. I would use a value like -1 (if it is an integer foreign_key with only values > 0).

How to select only a subset of
dimension values? Or exclude a subset?

with_scope()

you could also overwrite the find function

   def self.find(*args)
    if  anything
      with_scope(a_scope) do
         result = super *args
      end
    else
      result = super *args
    end
   end

   def self.a_scope
    {:find => { :conditions => ["person_id  = ?", me] , :readonly => true}}
   end

Has anyone had experience with
creating named_scopes to deal with
this? There's the obvious attraction
of being able to chain one for each
dimension of interest, but does it get
too clunky if we get to 7 or 8
dimensions?

We have an olap database with 4 dimension and it works nice. I think if you implement some custom methods for active_record, you will have fun with your app.

I also found this: http://github.com/aeden/activewarehouse/tree/master

开始看清了 2024-08-06 10:20:30

不久前,在我开始关注其他事情之前,我正在考虑使用 ActiveWarehouse ,所以我不能告诉您它的效果如何,但需要将其添加到您的列表中进行检查。 它具有事实、维度和多维数据集生成器以及 ETL 工具包。

I was looking into using ActiveWarehouse a while ago before I got pull of onto other things, so I can't tell you how well it works, but it's something to add to your list to check out. It has generators for facts, dimensions, and cubes as well as an ETL toolkit.

画中仙 2024-08-06 10:20:30

因此,我们有许多复杂程度不同的宝石或插件,但似乎没有一个正在积极开发(或者有一些事情正在发生,但在雷达之下)。

无论如何,我不想构建一个数据仓库,只是为了以星型模式样式实现一两个事实表。 我想要的是关于如何访问这样一个表的想法。

我倾向于放弃多级别的想法,其中跨维度的聚合将在该维度的外键中具有 NULL。 虽然它会减少查询中涉及的行数,但好处很小而且不是免费的:我将拥有更大的事实表和更复杂的代码。

检索看起来可能会使用一组 named_scopes 进行处理,每个维度一个,用于过滤。 或者使用适当构造的哈希的自定义查找器可能会更好。

当我构建它时,我会回来提供一些更好的信息......

So we have a number of gems or plugins of varying degrees of complexity, none of which seem to be very actively under development (or there's something going on but it's under the radar).

In any case, I'm not looking to build a data warehouse, just to implement a fact table or two in star schema style. What I was asking for was ideas on how to access such a table.

I'm leaning towards dropping the idea of multiple-levels, where an aggregate across a dimension would have a NULL in the foreign key for that dimension. While it would reduce the number of rows involved in a query the benefit would be small and not free: I'd have a larger fact table and more complex code.

Retrieval looks like it might then be handled with a set of named_scopes, one for each dimension, for filtering. Or a custom finder, fed with a suitably-constructed Hash, might be better.

When I've built it, I'll come back with some better information...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文