每周汇总表;如何引用时间维度
我们正在考虑向我们的小数据仓库添加每周汇总表。我们有一个经典的时间维度,具体到每日级别(年/月/日)以及适当的周/季度/等。列。
我们希望这个新的每周汇总表中的时间键引用我们的时间维度。这里的最佳实践是什么 - 让时间键引用它所代表的一周中第一天的 ID?还是最后一天?或者完全不同的东西?
We're thinking about adding a weekly summary table to our little data warehouse. We have a classic time dimension down to the daily level (Year/Month/Day) with the appropriate Week/Quarter/etc. columns.
We'd like to have the time key in this new weekly summary table reference our time dimension. What's the best practice here—have the time key reference the id of the first day in the week it represents? Or the last day? Or something entirely different?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
按照惯例,具有日期期间聚合(周、月...)的事实表会引用该期间最后一天的 DateKey —— 因此,对于本示例,您将引用该期间最后一天的最后一天。星期。
这也符合逻辑,一周必须结束才能汇总。
重要的是(在某处)明确声明事实表的粒度是一周,以便报表设计者意识到这一点。
By convention, the fact tables with date period aggregations (week, month...) reference the DateKey of the last day of the period -- so, for this example you would reference the last day of the week.
Kind of logical too, the week must end in order to be aggregated.
It is important to clearly state (somewhere) that the grain of the fact table is one-week, so that report designers are aware of this.
天是最好通过自然键(它们在公历中的表示形式)识别的实体的一个很好的例子。
要确定一周或一个月,最好使用其第一天。在
Oracle
中,您可以通过调用TRUNC
轻松检索它:在其他系统中,它有点复杂,但也很容易。
Days are a good example of an entity best identified by natural keys — their representations in Gregorian calendar.
To identify a week or a month, it's best to use its first day. In
Oracle
, you can easily retrieve it by a call toTRUNC
:In other systems it's a little bit more complex but quite easy too.
制作一个新的维度“周”怎么样?
如果需要,您可以创建时间和周维度之间的关系。
What about making a new dimension "Week"?
You can create a relation between time and week dimension, if you need.
关于之前的答案,我实际上希望通过附加到与第一个关联的键来存储与时间维度层次结构的临时级别关联的数据 - 当它与该临时时间段的原子测量相关时 - > 期间的日期 - 这使得加载时(特别是几个月 - 我猜几周可能总是需要一些计算)以及报告时更加简单 - 尽管如此,它是一个约定,只要你选择一个常识性选项(并坚持下去)你会没事的。
顺便说一句,不要创建周维度 - 您应该使用丰富的时间维度,其中包含年、季度、月、周、日等可用的所有层次结构(请记住,通常有多个、排他的层次结构),在这种情况下only 还建议使用 20100920 形式的无意义代理键 - 日期是不可变的,并且这种格式可以轻松包含为 int 列,因此使用无意义的日期键(或在 dim_time 中)没有什么价值 - 如果您曾经不得不编写查询来取消引用数据,其中无意义的 SK 用于时间维度,您知道(不必要的)痛苦...
M
Apropos an earlier answer I would actually expect to store data associated with an interim level of the time dimension hierarchy - when it relates to an atomic measurement for that interim time period - by attaching to the key associated with the first day of the period - this makes it much more straightforward when loading (esp with months - I guess weeks might always require some calculation) and also when reporting - nonetheless it is a convention and as long as you pick a common-sense option (and stick to it) you will be fine.
BTW do not create a week dimension - you should be using a rich time dimension with all the hierarchies available within it for year, quarter, month, week, day etc (bearing in mind there are often multiple, exclusive heirarchies) and in this instance only would also recommend a non-meaningless surrogate key in the form 20100920 - dates are immutable and in this format can easily be contained as int columns so there is little value in using a meaningless keys for dates (or in dim_time either) - if you have ever had to write queries to dereference data where meaningless SKs are used for the time dimension you know the (unnecessary) pain...
M