用户数据的数据仓库——设计Q

发布于 2024-10-16 22:58:56 字数 186 浏览 1 评论 0原文

如何最好地存储用户数据与日期/时间维度?用例是我试图存储每天、每小时的用户操作。例如分享数、喜欢数、好友数等。我有一个时间表和日期表。对于时间来说很简单 - 我每天的每个小时的每一行 = user_id 和列 = 1 到 24 。但问题在于日期。如果我每天= 1 列,那么我一年将有 365 列。我也无法存档数据,因为分析也需要过去的数据。还有哪些其他策略?

How to best store user data vs date/time dimension? Usecase is I am trying to store user actions per day, per hour. Such as number of Shares, likes, friends etc. I have a time table and a date table. For time it is easy - i have each row = user_id and colunms = 1 to 24 for each hour of the day. But problem is for dates. If i give each day = 1 colunm then i will have 365 colunms a year. I cannot archive the data way either because analytic needs past data too. What are the other strategies?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

云朵有点甜 2024-10-23 22:58:56

在此处输入图像描述

dimDate : 1 row per date
dimTime : 1 row per minute

首先,您必须说明事实表的“粒度”然后坚持下去

如果谷物是一天,那么TimeKey总是指向“23:59”这个键。

如果粒度是一小时,则 TimeKey 指向“HH:59”条目。

如果粒度是一分钟,则 TimeKey 指向相应的“HH:MM”

如果粒度是 15 分钟,则 TimeKey 指向相应的“HH:14” 、“HH:29”、“HH:44”、“HH:59”

等等...

-- How many new friends did specific user gain
-- in first three months of years 2008, 2009 and 2010
-- between hours 3 and 5 in the morning
-- by day of week
-- not counting holidays ?

select
      DayOfWeek
    , sum(NewFriends) as FriendCount
from factUserAction as f
join dbo.dimUser    as u on u.UserKey = f.UserKey
join dbo.dimDate    as d on d.DateKey = f.DateKey
join dbo.dimTime    as t on t.TimeKey = f.TimeKey
where CalendarYear between 2008 and 2010
  and MonthNumberInYear between 1 and 3
  and t.Hour between 3 and 5
  and d.IsHoliday = 'no'
  and UserEmail = '[email protected]' 
group by DayOfWeek
order by DayOfWeek ;

enter image description here

dimDate : 1 row per date
dimTime : 1 row per minute

At the beginning you have to state the "grain" of the fact table and then stick to it.

If the grain is one day, then TimeKey always points to the key of "23:59".

If the grain is one hour, then TimeKey points to entries of "HH:59".

If the grain is one minute, then TimeKey points to the respective "HH:MM"

If the grain is 15 minutes, then TimeKey points to the respective "HH:14", "HH:29", "HH:44", "HH:59"

And so on...

-- How many new friends did specific user gain
-- in first three months of years 2008, 2009 and 2010
-- between hours 3 and 5 in the morning
-- by day of week
-- not counting holidays ?

select
      DayOfWeek
    , sum(NewFriends) as FriendCount
from factUserAction as f
join dbo.dimUser    as u on u.UserKey = f.UserKey
join dbo.dimDate    as d on d.DateKey = f.DateKey
join dbo.dimTime    as t on t.TimeKey = f.TimeKey
where CalendarYear between 2008 and 2010
  and MonthNumberInYear between 1 and 3
  and t.Hour between 3 and 5
  and d.IsHoliday = 'no'
  and UserEmail = '[email protected]' 
group by DayOfWeek
order by DayOfWeek ;
能否归途做我良人 2024-10-23 22:58:56

您可以将日期存储在维度中,然后添加计算字段,例如day_of_year。

在我从事的设计中,我们从来没有比一天更精细的时间片,但我不明白为什么不能有一个基于日期时间的时间维度,作为粒度?

user_activity_facts(
   time_key references time_dimension(time_key)
  ,user_key references user_dimension(user_key)
  ,measure1
  ,measure2
  ,measure3
  ,primary key(time_key, user_key)
)
partition by range(time_key)(
   ...
)

You would store the date in the dimension, and then add computed fields such as day_of_year.

On the designs I've worked on, we've never had more granular time slices than day, but I can't see why one couldn't have a time dimension based on date-hour, as the grain?

user_activity_facts(
   time_key references time_dimension(time_key)
  ,user_key references user_dimension(user_key)
  ,measure1
  ,measure2
  ,measure3
  ,primary key(time_key, user_key)
)
partition by range(time_key)(
   ...
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文