查询涉及时间维度的数据仓库数据
我有两个时间维度
日期表(每天都有唯一行)
一天中的时间(一天中每一分钟的唯一行)
给定这个模式,如果一个人想要检索过去 X 个小时的事实,其中 X 可以是大于 0 的任何数字,那么查询会是什么样子。
当开始时间和结束时间恰好在一年中的不同日期。
编辑:我的事实表没有时间戳列
I have two tables for time dimension
date (unique row for each day)
time of the day (unique row for each minute in a day)
Given this schema what would a query look like if one wants to retrieve facts for last X hours where X can be any number greater than 0.
Things start to be become tricky when the start time and end time happen to be in two different days of the year.
EDIT: My Fact table does not have a time stamp column
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
事实表确实有(并且应该有)原始时间戳,以避免在一天的范围内发生奇怪的按时间查询。奇怪意味着 WHERE 子句中有某种类型的复杂日期时间函数。
在大多数 DW 中,这些类型的查询非常罕见,但您似乎将数据流式传输到 DW 中并同时使用它进行报告。
所以我建议:
在事实表中引入完整的时间戳。
对于旧记录,从日期和时间键重新创建时间戳。
DW 查询都是关于 WHERE 子句中没有任何函数,或者如果必须使用函数,请确保它是 SARGABLE。
Fact tables do have (and should have) original timestamp in order to avoid weird by-time queries which happen over the boundary of a day. Weird means having some type of complicated date-time function in the WHERE clause.
In most DWs these type of queries are very rare, but you seem to be streaming data into your DW and using it for reporting at the same time.
So I would suggest:
Introduce the full timestamp in the fact table.
For the old records, re-create the timestamp from the Date and Time keys.
DW queries are all about not having any functions in the WHERE clause, or if a function has to be used, make sure it is SARGABLE.
将
Start Date
和End Date
列转换为TIMESTAMP
并填充它们可能会更好。对表进行切片需要在开始日期和结束日期之间采用适当的间隔。在 Oracle 中,
间隔
类似于SYSDATE - (4/24)
或SYSDATE - NUMTODSINTERVAL(4, 'HOUR')
这也可以重写为:
You would probably be better served by converting the
Start Date
andEnd Date
columns toTIMESTAMP
and populating them.Slicing the table would require taking the appropriate
interval BETWEEN Start Date AND End Date
. In Oracle theinterval
would be something along the lines ofSYSDATE - (4/24)
orSYSDATE - NUMTODSINTERVAL(4, 'HOUR')
This could also be rewritten as:
在我看来,鉴于您当前的架构,您将需要从时间维度表中检索满足您的搜索条件的适当时间 ID,然后在事实表中搜索匹配的行。根据时间维度的粒度,您可能需要检查执行以下任一操作的性能(SQL Server 示例):
子选择:
SELECT X FROM FOO WHERE TIMEID IN (SELECT ID FROM DIMTIME WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) AND DATEID IN (SELECT ID FROM DIMDATE WHERE DATE = GETDATE())
内部联接:
SELECT X FROM FOO INNER JOIN DIMTIME ON TIMEID = DIMTIME.ID WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) INNER JOIN DIMDATE ON DATEID = DIMDATE.ID WHERE DATE = GETDATE()
这些都不是真正有吸引力的选择。
您是否考虑过您可能正在查询用于汇总分析而不一定用于“最后 X”分析的多维数据集?
如果这不是一个“卷起”立方体,我会同意其他海报,因为你应该用更好的键重新标记你的事实表,如果你实际上打算经常在下班时间进行搜索,你可能应该也将其包含在事实表中,因为任何其他尝试都可能使查询不可控制(请参阅 什么使SQL 语句可控制?)。
Microsoft 建议访问 http://msdn.microsoft .com/en-us/library/aa902672%28v=sql.80%29.aspx 表示:
祝你好运!
It seems to me that given the current schema you have, that you will need to retrieve the appropriate time IDs from the time dimension table which meet your search criteria, and then search for matching rows in the fact table. Depending on the granularity of your time dimension, you might want to check the performance of doing either (SQL Server examples):
A subselect:
SELECT X FROM FOO WHERE TIMEID IN (SELECT ID FROM DIMTIME WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) AND DATEID IN (SELECT ID FROM DIMDATE WHERE DATE = GETDATE())
An inner join:
SELECT X FROM FOO INNER JOIN DIMTIME ON TIMEID = DIMTIME.ID WHERE HOUR >= DATEPART(HOUR, CURRENT_TIMESTAMP()) INNER JOIN DIMDATE ON DATEID = DIMDATE.ID WHERE DATE = GETDATE()
Neither of these are truly attractive options.
Have you considered that you may be querying against a cube that is intended for roll-up analysis and not necessarily for "last X" analysis?
If this is not a "roll-up" cube, I would agree with the other posters in that you should re-stamp your fact tables with better keys, and if you do in fact intend to search off of hour frequently, you should probably include that in the fact table as well, as any other attempt will probably make the query non-sargable (see What makes a SQL statement sargable?).
Microsoft recommends at http://msdn.microsoft.com/en-us/library/aa902672%28v=sql.80%29.aspx that:
Best luck!