关于BI维度衡量的思考

发布于 2024-12-17 14:10:05 字数 189 浏览 0 评论 0原文

我正在与一位顾问合作,他建议创建一个度量维度,然后将度量维度键添加到我们的事实表中。

我可以看到如何通过仅添加行而不是在事实表中物理创建列来更轻松地添加新度量。我还可以看到这如何向 ETL 流程添加工作、向星型模式添加另一个联接、事实表中的一个通用列来保存所有度量数据等。

我对其他人如何处理这种情况感兴趣。目前我们有近二十项措施。

I am working with a consultant who recommends creating a measure dimension and then adding the measure dimension key to our fact table.

I can see how this can make adding new measures easier by just adding rows instead of physically creating columns in the fact table. I can also see how this can add work to the ETL process, adds another join to the star schema, one generic column in fact table to hold all measure data etc.

I'm interested in how others have dealt with this situation. We currently have close to twenty measures.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

阳光下的泡沫是彩色的 2024-12-24 14:10:05

本能地,我不喜欢它:它是 EAV 模型,不是很受欢迎(你可以 Google 一下原因)。

  • EAV模型一般被认为查询和维护比较头疼。
  • 不同的度量搭配不同的维度;这种方法可以很容易地变成“一个巨大的事实表用于所有事情”,而不是针对特定报告区域的多个较小的事实表
  • 我怀疑您最终会创建视图来提供多个事实表的外观无论如何
  • 您将乘以您的行数事实表按度量数量计算,从而产生更大的物理表
  • 即使使用良好的索引/分区方案,包含多个度量的查询也必须读取更多行才能获取数据,那么
  • 具有不同数据类型的度量又如何呢? ?
  • 您的报告工具很容易支持这一点吗?

我确信还有其他问题,但这些是立即浮现在我脑海中的问题。根据经验,如果有人建议在任何情况下实施 EAV,您都应该非常谨慎,并准确询问他们它提供了哪些优势,以及随着数据和复杂性的增加将如何管理它。但我认为您已经确定了一些关键的关注领域。

Instinctively, I don't like it: it's the EAV model, which is not very popular (you can Google the reasons why).

  • The EAV model is generally considered to be a headache to query and maintain
  • Different measures go together with different dimensions; this approach could easily turn into "one giant fact table for everything" instead of multiple smaller fact tables for specific reporting areas
  • I suspect you would end up creating views to give the appearance of multiple fact tables anyway
  • You will multiply the number of rows in your fact table by the number of measures, resulting in a much bigger physical table
  • Even with a good indexing/partitioning scheme, queries that include more than one measure will have to read a lot more rows to get the data
  • What about measures with different data types?
  • Is this easily supported in your reporting tool?

I'm sure there are other issues, but those are the ones that come to mind immediately. As a rule of thumb, if someone suggests an EAV implementation in any context, you should be very wary and ask them exactly what advantages it offers and how it will be managed as the data and complexity increase. But I think you've already identified some key areas of concern.

强辩 2024-12-24 14:10:05

SSAS 会做到这一点,我知道一家主要的保单管理软件供应商为他们的系统提供了一个 MI 解决方案,其工作原理如下。您确实从该方法中获得了一些灵活性,因为您可以添加度量而无需部署多维数据集的构建,尽管对于 20 个度量,我认为您不需要担心这一点。

“措施”本质上是另一个维度(并且在文档中经常这样提及)。我相信 SSAS 在幕后使用了很大程度上面向列的结构。

然而,这种方法的简单应用确实会带来一些问题,这些问题可能会或多或少地困扰您。

  • 您只有一个度量,[值]、[金额] 或任何名称。如果您的工具不允许您在前端注入计算的度量,那么您无法根据其中一种属性类型的值对整个数据集进行排序。 ProClarity 和报表生成器 >=2.0 会执行此操作,但 Excel 不会。

  • 您不能以这种方式进行比率或其他计算度量。您必须将它们嵌入到多维数据集脚本中(这意味着您需要部署构建来添加它们),或者使用可让您在客户端中定义它们的工具。

  • 尽管它对多维数据集没有太大区别,但数据库查询速度会很慢,并且会增加存储需求。在数据库上查询也很繁琐。

SSAS will do this, and I know of a major vendor of insurance policy administration software that provided a M.I. solution for their system that works like this. You do get some flexibility from the approach in that you can add measures without having to deploy a build of the cube, although for 20 measures I don't think you need to worry about that.

'Measures' is essentially another dimension (and often referred to as such in the documentation). I believe SSAS uses a largely column-oriented structure behind the scenes.

However, a naive application of this approach does have some issues that could come and bite you to a greater or lesser extent.

  • You only have one measure, [Value], [Amount] or whatever it's called. If your tool won't let you inject calculated measures at the front-end then you can't sort the whole data set on the value of one of your attribute types. ProClarity and report builder >=2.0 will do this but Excel won't.

  • You can't do ratios or other calculated measures in this way. You will have to either embed them in the cube script (meaning you need to deploy a build to add them) or use a tool that lets you define them in the client.

  • Although it doesn't make a lot of differece to the cube it will be slow to query on the database and increase storage requirements. It's also fiddly to query on the database.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文