查询涉及时间维度的数据仓库数据

发布于 2024-11-29 06:26:48 字数 181 浏览 1 评论 0原文

我有两个时间维度

日期表(每天都有唯一行)
一天中的时间(一天中每一分钟的唯一行)

给定这个模式,如果一个人想要检索过去 X 个小时的事实,其中 X 可以是大于 0 的任何数字,那么查询会是什么样子。

当开始时间和结束时间恰好在一年中的不同日期。

编辑:我的事实表没有时间戳列

I have two tables for time dimension

date (unique row for each day)
time of the day (unique row for each minute in a day)

Given this schema what would a query look like if one wants to retrieve facts for last X hours where X can be any number greater than 0.

Things start to be become tricky when the start time and end time happen to be in two different days of the year.

EDIT: My Fact table does not have a time stamp column

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夜司空 2024-12-06 06:26:48

事实表确实有(并且应该有)原始时间戳,以避免在一天的范围内发生奇怪的按时间查询。奇怪意味着 WHERE 子句中有某种类型的复杂日期时间函数。

在大多数 DW 中,这些类型的查询非常罕见,但您似乎将数据流式传输到 DW 中并同时使用它进行报告。

所以我建议:

  1. 在事实表中引入完整的时间戳。

  2. 对于旧记录,从日期和时间键重新创建时间戳。

DW 查询都是关于 WHERE 子句中没有任何函数,或者如果必须使用函数,请确保它是 SARGABLE

Fact tables do have (and should have) original timestamp in order to avoid weird by-time queries which happen over the boundary of a day. Weird means having some type of complicated date-time function in the WHERE clause.

In most DWs these type of queries are very rare, but you seem to be streaming data into your DW and using it for reporting at the same time.

So I would suggest:

  1. Introduce the full timestamp in the fact table.

  2. For the old records, re-create the timestamp from the Date and Time keys.

DW queries are all about not having any functions in the WHERE clause, or if a function has to be used, make sure it is SARGABLE.

深海蓝天 2024-12-06 06:26:48

Start DateEnd Date 列转换为 TIMESTAMP 并填充它们可能会更好。

对表进行切片需要在开始日期和结束日期之间采用适当的间隔。在 Oracle 中,间隔 类似于 SYSDATE - (4/24)SYSDATE - NUMTODSINTERVAL(4, 'HOUR')

这也可以重写为:

Start Date <= (SYSDATE - (4/24)) AND End Date >= (SYSDATE - (4/24))

You would probably be better served by converting the Start Date and End Date columns to TIMESTAMP and populating them.

Slicing the table would require taking the appropriate interval BETWEEN Start Date AND End Date. In Oracle the interval would be something along the lines of SYSDATE - (4/24) or SYSDATE - NUMTODSINTERVAL(4, 'HOUR')

This could also be rewritten as:

Start Date <= (SYSDATE - (4/24)) AND End Date >= (SYSDATE - (4/24))
掩耳倾听 2024-12-06 06:26:48

在我看来,鉴于您当前的架构,您将需要从时间维度表中检索满足您的搜索条件的适当时间 ID,然后在事实表中搜索匹配的行。根据时间维度的粒度,您可能需要检查执行以下任一操作的性能(SQL Server 示例):

  1. 子选择

    SELECT X FROM FOO WHERE TIMEID IN (SELECT ID FROM DIMTIME WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) AND DATEID IN (SELECT ID FROM DIMDATE WHERE DATE = GETDATE())

  2. 内部联接

    SELECT X FROM FOO INNER JOIN DIMTIME ON TIMEID = DIMTIME.ID WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) INNER JOIN DIMDATE ON DATEID = DIMDATE.ID WHERE DATE = GETDATE()

这些都不是真正有吸引力的选择。

您是否考虑过您可能正在查询用于汇总分析而不一定用于“最后 X”分析的多维数据集?

如果这不是一个“卷起”立方体,我会同意其他海报,因为你应该用更好的键重新标记你的事实表,如果你实际上打算经常在下班时间进行搜索,你可能应该也将其包含在事实表中,因为任何其他尝试都可能使查询不可控制(请参阅 什么使SQL 语句可控制?)。

Microsoft 建议访问 http://msdn.microsoft .com/en-us/library/aa902672%28v=sql.80%29.aspx 表示:

与其他维度表中使用的代理键相比,日期和时间维度键应该是“智能的”。日期维度的建议键采用“yyyymmdd”形式。这种格式很容易让用户记住并合并到查询中。对于按日期分区为多个表的事实表,这也是推荐的代理键格式。

祝你好运!

It seems to me that given the current schema you have, that you will need to retrieve the appropriate time IDs from the time dimension table which meet your search criteria, and then search for matching rows in the fact table. Depending on the granularity of your time dimension, you might want to check the performance of doing either (SQL Server examples):

  1. A subselect:

    SELECT X FROM FOO WHERE TIMEID IN (SELECT ID FROM DIMTIME WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) AND DATEID IN (SELECT ID FROM DIMDATE WHERE DATE = GETDATE())

  2. An inner join:

    SELECT X FROM FOO INNER JOIN DIMTIME ON TIMEID = DIMTIME.ID WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) INNER JOIN DIMDATE ON DATEID = DIMDATE.ID WHERE DATE = GETDATE()

Neither of these are truly attractive options.

Have you considered that you may be querying against a cube that is intended for roll-up analysis and not necessarily for "last X" analysis?

If this is not a "roll-up" cube, I would agree with the other posters in that you should re-stamp your fact tables with better keys, and if you do in fact intend to search off of hour frequently, you should probably include that in the fact table as well, as any other attempt will probably make the query non-sargable (see What makes a SQL statement sargable?).

Microsoft recommends at http://msdn.microsoft.com/en-us/library/aa902672%28v=sql.80%29.aspx that:

In contrast to surrogate keys used in other dimension tables, date and time dimension keys should be "smart." A suggested key for a date dimension is of the form "yyyymmdd". This format is easy for users to remember and incorporate into queries. It is also a recommended surrogate key format for fact tables that are partitioned into multiple tables by date.

Best luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文