数据仓库中的时间和日期维度

发布于 2024-08-26 12:30:41 字数 472 浏览 7 评论 0原文

我正在构建一个数据仓库。每个事实都有它的时间戳。我需要按天、月、季度创建报告,但也需要按小时创建报告。查看示例,我发现日期往往保存在维度表中。 alt starexample
(来源:etl-tools.info

但我认为,这对于时间来说没有意义。维度表会不断增长。另一方面,使用日期维度表进行 JOIN 比在 SQL 中使用日期/时间函数更有效。

您的意见/解决方案是什么?

(我正在使用Infobright)

I'm building a data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables. alt starexample
(source: etl-tools.info)

But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL.

What are your opinions/solutions ?

(I'm using Infobright)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

没有伤那来痛 2024-09-02 12:30:41

Kimball 建议 单独的时间和日期维度

在之前的工具包书籍中,我们有
推荐建造这样的尺寸
带有分钟或秒部分
时间作为从午夜的偏移量
每一天,但我们已经意识到
最终用户
申请变得太困难,
尤其是在尝试计算时间时
跨度。另外,与日历日不同
维度,数量很少
的描述性属性
某个时间段内的特定分钟或秒
天。如果企业有良好的
时间片的定义属性
一天之内,例如班次名称,或
广告时段,额外
可以添加一天中的时间维度
该尺寸所在的设计
定义为分钟数(或
甚至几秒)过了午夜。因此这个
一天中的时间维度要么有
1440条记录若粒分分钟
或 86,400 条记录(如果谷物是)
秒。

Kimball recommends having separate time- and date dimensions:

In previous Toolkit books, we have
recommended building such a dimension
with the minutes or seconds component
of time as an offset from midnight of
each day, but we have come to realize
that the resulting end user
applications became too difficult,
especially when trying to compute time
spans. Also, unlike the calendar day
dimension, there are very few
descriptive attributes for the
specific minute or second within a
day. If the enterprise has well
defined attributes for time slices
within a day, such as shift names, or
advertising time slots, an additional
time-of-day dimension can be added to
the design where this dimension is
defined as the number of minutes (or
even seconds) past midnight. Thus this
time-ofday dimension would either have
1440 records if the grain were minutes
or 86,400 records if the grain were
seconds.

黎夕旧梦 2024-09-02 12:30:41

我的猜测是,这取决于您的报告要求。
如果您需要

WHERE "Hour" = 10

每天 10:00:00 到 10:59:59 之间的含义,那么我会使用时间维度,因为它比

WHERE date_part('hour', TimeStamp) = 10  

为每一行评估 date_part() 函数更快。
您仍然应该将时间戳保留在事实表中,以便在天数范围内进行聚合,例如:

WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15' 

在使用维度字段时这会变得很尴尬。

通常,时间维度具有分钟分辨率,因此有1440行。

My guess is that it depends on your reporting requirement.
If you need need something like

WHERE "Hour" = 10

meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than

WHERE date_part('hour', TimeStamp) = 10  

because the date_part() function will be evaluated for every row.
You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:

WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15' 

which gets awkward when using dimension fields.

Usually, time dimension has a minute resolution, so 1440 rows.

剩一世无双 2024-09-02 12:30:41

时间应该是数据仓库的一个维度,因为您经常需要对其进行聚合。您可以使用 snowflake-Schema 来减少开销。总的来说,正如我在评论中指出的那样,时间似乎是一个异常高的分辨率。如果你坚持这样做,将一天中的某个时刻作为一个单独的维度可能会有所帮助,但我无法告诉你这是否是一个好的设计。

Time should be a dimension on data warehouses, since you will frequently want to aggregate about it. You could use the snowflake-Schema to reduce the overhead. In general, as I pointed out in my comment, hours seem like an unusually high resolution. If you insist on them, making the hour of the day a separate dimension might help, but I cannot tell you if this is good design.

烟若柳尘 2024-09-02 12:30:41

我建议为日期和时间设置单独的维度。日期维度将为每个日期提供 1 条记录,作为已识别的有效日期范围的一部分。例如:1980 年 1 月 1 日至 2025 年 12 月 31 日。

时间的单独维度有 86400 条记录,每秒有一条由时间键标识的记录。

在事实记录中,如果您需要日期和时间,请添加引用这些一致维度的两个键。

I would recommend having seperate dimension for date and time. Date Dimension would have 1 record for each date as part of identified valid range of dates. For example: 01/01/1980 to 12/31/2025.

And a seperate dimension for time having 86400 records with each second having a record identified by the time key.

In the fact records, where u need date and time both, add both keys having references to these conformed dimensions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文