数据仓库设计中星型模式的度量到底是什么?
星型模式由维度表和事实表组成。
事实表包含每个维度的外键,除此之外,它还包含“度量”。这项措施具体包括哪些内容?
是否存储了某个聚合函数的答案?
Star schema consists of dimension and fact tables.
Fact tables contain foreign keys for each dimension and in addition to that, it contains "measures". What exactly comprises of this measure ?
Is it some aggregate function's answer that is stored ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
基本上是的。
如果您有一个简单的网格,
那么时间是一个具有两个级别的分层维度(如图所示)。
显示的另一个维度是“EmployeeID”。另一个维度(未显示)可以在观点中(例如预算/实际)。
金额(例如 1100)是衡量标准,它构成您的事实(事实的非识别部分)。这些维度定义了各个级别上每个度量的合并函数(例如 Amount(Q1) == SUM(Amount(January...March)))。请注意,根据度量的不同,合并的行为也会有所不同(例如,不会对所得税百分比进行求和,而是以某种方式进行合并:OLAP 多维数据集的设计艺术究竟如何)。
(琐事:您可以计算出度量值,使用 MDX 来查询,例如与上一季度相比的金额偏差、整个季度的平均工资等;很明显,再次需要考虑合并公式) 。
此时,您将开始看到,设计合并规则取决于计算规则的顺序(如果首先计算“工资偏差%”的公式,然后进行合并,则需要对其进行平均;但是,如果原始工资度量首先合并(求和)到 Q1、Q2 级别,然后可以像在最低级别一样计算派生度量。
现在,在决定如何存储多维数据集时,事情变得更有趣 基本上存在两种方法:
大多数 OLAP 引擎都融合在混合方法 (HOLAP) 上,这不会令任何人感到惊讶,其中经常访问的合并级别的重要部分是预先计算和存储的,而其他部分是预先计算和存储的。 。
有些会将底层数据存储在标准 RDBMS (ROLAP) 中,而另一些则不会 (OLAP)。专注于高性能的引擎倾向于将所有数据保存在预先计算的多维数据集中(仅诉诸“许多小数据集”) 子立方体用于非常稀疏的维度)。
好吧,无论如何,这有点咆哮。我喜欢漫无目的地讲述我曾经在数据仓库和 OLAP 方面学到的东西
Basically yes.
If you had a simple grid
Time is a hierarchical dimension with two levels (shown).
The other dimension shown is 'EmployeeID'. The other dimension (not shown) could be in the PointOfView (e.g. Budget/Actual).
The Amount (1100, e.g.) is the Measure and it constitutes your facts (the non-identifying parts of the facts). The dimensions define consolidation functions for each measure on the various levels (E.g. Amount(Q1) == SUM(Amount(January...March))). Note that the consolidation will behave differently depending on the measure (e.g. the income tax % will not be summed, but somehow consolidated: how exactly is the art of OLAP Cube design).
(trivia: you can have calculated measures, that use MDX to query e.g. the deviation of Amount in comparison the the preceding Quarter, the Average salary acoss the whole quarter etc.; it will be pretty clear that again, the consolidation formulas require thought).
At this point you will start to see that designing the consolidation rules depends on the order in which the rules are calculated (if the formula for 'salary deviation %' is is evaluated FIRST and then consolidated, you need to average it; however if the raw SALARY measure is consolidated (summed) to the Q1,Q2 level first, then the derived Measure can be calculated like it was at the lowest level.
Now things become more fun when deciding how to store the cube. Basically two ways exist:
It won't surprise anyone that most OLAP engines have converged on hybrid methods (HOLAP), where significant parts of frequently accessed consolidation levels are pre-calculated and stored, and other parts are calculated on the fly.
Some will store the underlying data in a standard RDBMS (ROLAP) other won't (OLAP). The engines focused on high performance tend to keep all data in precalculated cubes (only resorting to 'many small sub-cubes' for very sparse dimensions).
Well, anywyas, this was a bit of a rant. I liked rambling off what I once learned when doing datawarehousing and OLAP
事实和衡量标准是同义词。事实是数据:销售、生产、交付等。维度是与事实(时间、地点、部门)相关的信息。
Fact and measure are synonyms afaik. Facts are data: sales, production, deliveries, etc. Dimensions are information tied to the fact (time, location, department).
措施是两种事物之一。
措施。测量。带单位的数字。美元、重量、体积、尺寸等。测量。
聚合。数据的总和(有时是平均值)。它可能是仓库中的数据:出于性能原因预先计算的聚合。或者它可能是无法获取(或不需要)的数据,因为它太详细了。体积太大或其他什么。
事实表最重要的一点是,非关键度量是带单位的实际度量。
Measures are one of two kinds of things.
Measures. Measurements. Numbers with units. Dollars, weights, volumes, sizes, etc. Measurements.
Aggregates. Sums (or sometimes averages) of data. It might be data in the warehouse: pre-computed aggregates for performance reasons. Or it might be data that can't be acquired (or isn't needed) because it's too detailed. Too high volume or something.
The most important thing about a fact table is that the non-key measures are actual measurements with units.
如果它是相邻的树模型,则它将是标题字段或包含数据的任何其他字段。
If it would be an adjacent tree model it would be the title-field or any other field that contains the data.