我可以在事实表中将非度量代码与度量混合吗?
我们正在做一些复杂的数据积累。 我们的客户向我们发送了一些包含两个维度(时间和业务单位)的内容。 时间大多是年月。 业务单位维度只有几个属性:名称以及出于报告和分析目的而为业务单位所属的几个类别。
他们发送给我们的内容包括一些当前状态信息(日期和代码)。 这些看起来都是事实。 他们还发送一些描述与业务部门关系的信息(主要是附加代码)。 同样,这些对于业务部门和时间段来说是独特的。
最后,他们向我们发送的东西显然是附加事实。 它包括货币和具有适当单位的计数。
我应该将这些定性信息与附加事实混合在一个事实表中吗? 或者我应该将定性的东西(只能与计数一起使用)与定量的东西(可以与总和一起使用)分开?
We're doing a complex bit of data accumulation. Our customer sends us some stuff that includes two dimensions (time and a business unit). Time is mostly year-month. The business unit dimension has just a few attributes: a name, and a few categories to which BU's can belong for reporting and analysis purposes.
The stuff they send us includes some current state information (dates and codes). These seem fact-like. They also send some information that characterizes the relationship with the business unit (mostly additional codes). Again, these are unique to the business unit and time period.
Finally, they send us stuff that is clearly additive facts. It includes currency and counts that have proper units.
Should I commingle this qualitative information in a single fact table with the additive facts? Or should I separate the qualitative stuff (which can only be used with counts) from the quantitative stuff (which can be used with sum)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Brad Wilson 准确地描述了将它们添加到事实表中的风险。 过去,我向事实表添加了垃圾属性,只是为了稍后需要重构。
这些日期有什么商业目的? 顺便说一句,我建议将它们设为自己的尺寸并准确地描述它们。
进来的额外代码有多不稳定? 如果事实表的粒度是日期和 BU,为什么不能将它们包含在 BU 维度中并将其视为缓慢变化的属性?
如果没有更多细节,我无法提出明确的建议,但这些将是我问自己的第一个问题。
Brad Wilson accurately describes the risk of adding them to your fact table. In the past, I've added junk attributes to my fact table only to require refactoring later.
What business purpose do the dates serve? Offhand, I'd recommend making these their own dimensions and describe them accurately.
How volatile are the extra codes that come in? If the grain of your fact table is date and BU, why can't they be included in the BU dimension and treated as slowly changing attributes?
Without more details I can't make a firm recommendation but these would be the first questions I'd ask myself.
如果数据与附加事实直接相关,并且不是您想要分组/排序/搜索的内容,那么将其放入事实表中就可以了。
但请注意,事实表中的非附加数据要么会阻止汇总,要么会成为有损操作。
If the data is both directly related to the additive fact and is not something you want to be grouping/sorting/search on, then putting it in the fact table is okay.
Be aware, though, that non-additive data in the fact table will either prevent roll-ups or will become a lossy operation.
仅将退化的事物放入事实表中(在维度中导致高基数/唯一性问题,使维度与事实表形成 1-1 关系)。 金博尔建议避免将除简并维度之外的任何内容与事实(例如,唯一的订单号)放在一起。
你总是可以将它们放入金博尔所说的“垃圾”维度。 所有这些代码都可以简单地归入垃圾维度。 大多数日期将作为特定角色中日期维度的键进入事实表(通常使用 YYYYMMDD 形式的自然 int 键 - 这是我们不使用非身份无意义代理键的唯一一次)
我喜欢天真地将星星视为所有事实,然后哪些列进入哪些维度只是由方便决定的。 人们不一定将它们视为对应于特定的业务实体 - 请记住,明星并不是 ERD 风格的标准化 OLTP 数据库。
Only put things in the fact table if they are degenerate (causing a high-cardinality/uniqueness problems in your dimension where it takes the dimension to a 1-1 relationship to the fact table). Kimball recommends avoiding the temptation to put anything but degenerate dimensions in with the facts (unique order number, for instance).
You can always put these in what Kimball calls a "junk" dimension. All those codes can simply be lumped into a junk dimension. Most dates would go in the fact table as keys into your date dimension in a particular role (usually with a natural int key of the form YYYYMMDD - one of the only times we don't use a non-identity meaningless surrogate key)
I like to naively view the star as all the facts and then which columns go into which dimensions is simply determined by convenience. One should not necessarily view them as corresponding to a particular business entity - remember, the star is not an ERD-style normalized OLTP database.