用户数据的数据仓库——设计Q
如何最好地存储用户数据与日期/时间维度?用例是我试图存储每天、每小时的用户操作。例如分享数、喜欢数、好友数等。我有一个时间表和日期表。对于时间来说很简单 - 我每天的每个小时的每一行 = user_id 和列 = 1 到 24 。但问题在于日期。如果我每天= 1 列,那么我一年将有 365 列。我也无法存档数据,因为分析也需要过去的数据。还有哪些其他策略?
How to best store user data vs date/time dimension? Usecase is I am trying to store user actions per day, per hour. Such as number of Shares, likes, friends etc. I have a time table and a date table. For time it is easy - i have each row = user_id and colunms = 1 to 24 for each hour of the day. But problem is for dates. If i give each day = 1 colunm then i will have 365 colunms a year. I cannot archive the data way either because analytic needs past data too. What are the other strategies?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,您必须说明事实表的“粒度”然后坚持下去。
如果谷物是一天,那么
TimeKey
总是指向“23:59”这个键。如果粒度是一小时,则
TimeKey
指向“HH:59”条目。如果粒度是一分钟,则
TimeKey
指向相应的“HH:MM”如果粒度是 15 分钟,则
TimeKey
指向相应的“HH:14” 、“HH:29”、“HH:44”、“HH:59”等等...
At the beginning you have to state the "grain" of the fact table and then stick to it.
If the grain is one day, then
TimeKey
always points to the key of "23:59".If the grain is one hour, then
TimeKey
points to entries of "HH:59".If the grain is one minute, then
TimeKey
points to the respective "HH:MM"If the grain is 15 minutes, then
TimeKey
points to the respective "HH:14", "HH:29", "HH:44", "HH:59"And so on...
您可以将日期存储在维度中,然后添加计算字段,例如day_of_year。
在我从事的设计中,我们从来没有比一天更精细的时间片,但我不明白为什么不能有一个基于日期时间的时间维度,作为粒度?
You would store the date in the dimension, and then add computed fields such as day_of_year.
On the designs I've worked on, we've never had more granular time slices than day, but I can't see why one couldn't have a time dimension based on date-hour, as the grain?