当前位置：文江博客话题详情

日期/时间维度

发布于 2024-10-16 12:30:23 字数 323 浏览 9 评论 0原文

我正在设计一个数据仓库，但我遇到了一个棘手的时间问题。我需要的粒度是每小时的（计算每小时的事件总数），而且我还必须适应一种不方便适合 24 小时周期的轮班模式（事实上，“蓝色”轮班可能不会涵盖相同的情况）一天中的某个时间，持续几天）。

考虑到这一点，我正在考虑 3 种方法之一，

其中包含 175K 行的单一时间维度。
雪花时间维度，日历维度中有 7300 行，时间维度有 175k 行，
独立的维度使得事实表具有事件日期和事件时间的外键。

我倾向于方法 3，因为它允许在连接中单独引用小日历维度，但我将不胜感激任何想法。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

巴黎盛开的樱花 2024-10-23 12:30:23

是的，制造班次很棘手，并且会随着时间的推移而发生变化，通常一个班次在前一天开始，等等。

请记住，这里有两个日历。一种是标准日历，另一种是生产日历——班次属于生产日历。一般来说，生产日历中的一天可能持续多于（或少于）24 小时。

例如：

2011-02-07 23:45 星期一生成的部分可能看起来像

TimeOfProduction = '2011-02-07 23:45'
DateKey = 20110207
TimeKey = 2345
ProductionDateKey = 20110208 (the first shift of the next day started at 22:00)
ProductionTimeKey = 145 (1 hour and 45 minutes of the current production date)     
ShiftKey = 1
ShiftTimeKey = 145 (1 hour and 45 minutes of the current shift)

这样所以，我的建议是：

纯日期维度（每个日期一行）
纯时间维度 （24 小时每分钟一行 = 1440 行 + 请参阅下面的注释）
Shift Dimension -- 使用 rw_ValidFrom、(rw_ValidTo)、rw_IsCurrent 输入 2
维度DateKey 转换为 ProductionDateKey
将 TimeKey 角色扮演为 ProductionTimeKey 和 ShiftTimeKey。
将 TimeOfProduction (datetime) 也保留在事实表中。
在 ETL 过程中，应用当前的班次逻辑将 ProductionDateKey、ProductionTimeKey、ShiftKey、ShiftTimeKey 附加到 factPart 表的每一行。

请注意，如果生产日持续超过 24 小时，您可能需要向时间维度添加额外的行。如果使用当地时间并且存在夏令时跳跃，则通常可以。

因此，星星可能看起来像这样

在此处输入图像描述

Yes, manufacturing shifts are tricky and do change over time, often one shift starts day before, etc.

Keep in mind that there are two calendars here. One is the standard calendar and the other is the production calendar -- the shift belongs to the production calendar. In general, a day in production calendar may last more (or less) than 24 hours.

For example:

Part produced on Monday, 2011-02-07 23:45 may look like

TimeOfProduction = '2011-02-07 23:45'
DateKey = 20110207
TimeKey = 2345
ProductionDateKey = 20110208 (the first shift of the next day started at 22:00)
ProductionTimeKey = 145 (1 hour and 45 minutes of the current production date)     
ShiftKey = 1
ShiftTimeKey = 145 (1 hour and 45 minutes of the current shift)

So, my suggestion is:

Plain Date Dimension (one row per date)
Plain Time Dimension (one row per minute for 24 hours = 1440 rows + see note below)
Shift Dimension -- type 2 dimension with rw_ValidFrom, (rw_ValidTo) , rw_IsCurrent
Role-play the DateKey into ProductionDateKey
Role-play the TimeKey into a ProductionTimeKey and ShiftTimeKey.
Keep the TimeOfProduction (datetime) in the fact table too.
During the ETL process, apply the current shift logic to attach ProductionDateKey, ProductionTimeKey, ShiftKey, ShiftTimeKey to each row of the factPart table.

Note that you may need to add extra rows to the Time Dimension if a production day can last more than 24 hours. It usually can if a local time is used and there is a daylight savings time jump.

So, the star may look something like this

enter image description here

回复收藏 0 原文

凉风有信 2024-10-23 12:30:23

我的 0.02 英镑的价值：

假设没有因考虑转变而产生额外问题（@Andriy M 的问题）：

我倾向于打折选项 2，除非有特定的好处（性能、一类的简化）查询等）你可以通过采用它来看到。您没有描述任何此类好处，因此您似乎是为了本身而增加了复杂性。

我个人更喜欢选项 1——概念上最简单、最直接并且 (IMO) 最适合数据仓库方法。

选项 3 具有您提到的优点，但我一直怀疑它涵盖了两种选择：日历维度正如您所描述的那样，但时间维度的选择是 175k 行或 24。我目前无法提供对于这两种选择中的任何一种的争论，只是一种直觉，认为有两种这样的选择。如果转变问题与此相关，它可能会影响这些替代方案之间的选择（如果它们是真正的替代方案）。

如果您希望进一步采用选项 2，选项 3 中列出的替代方案也适用。

回复收藏 0 原文

阿楠 2024-10-23 12:30:23

我会选择选项 3。 - 单独的尺寸。优点：

简单性 - 两个相对较小的表 - 时间维度仅加载一次，因为一天中有固定的分钟数。
重用 - 两个独立的维度更有可能与仅具有日期或时间维度的其他事实表共享
通过在事实表中为日期维度提供单独的属性来轻松分区
可扩展性 - 考虑可以添加到日期和时间的属性时间维度随着您的报告需求的增长而变化。对于日期维度，这可能是（以避免每次从日期中提取此信息）：年、季度、月、日、周、日期标签（如“2011 年 9 月 12 日”）、月份名称、工作日名称、各种指标（假日指标、季末、月末等）。对于时间维度（为了准确性，可以包含一天中的每一秒），这可以是：小时、分钟、秒、日期部分标签（如“早上”、“晚上”）、工作时间指示器（从 8 开始的秒数： 00:00 到 17:00:00）等。但是，将所有内容都集中在一个维度中将意味着大量冗余。

在我看来，与日开始/结束不一致的班次是记录每个班次开始和结束时间戳的单独事实寓言的良好候选者 - 我的意思是具有以下外键的（无事实）事实表：id_date_start，id_time_start，id_date_end， id_time_end。然后，您可以从事件事实表“钻取”到班次表，以获得每个班次的汇总结果。

编辑：或者模型转变就像另一个维度一样 - 这取决于这样一个事实：对于您而言，转变是否是一个重要的业务流程，您希望通过其属性独立跟踪该流程（但目前我不认为任何其他属性，然后是日期和时间...位置，也许？）或者它只是事件的上下文（因此应该只是一个维度）。