使用 SQL Server 设计可扩展的积分排行榜系统
我正在寻找有关扩展积分排行榜系统的建议。我已经有了一个使用非常标准化策略的工作版本。第一个版本本质上是一个看起来像这样的表格。
UserPoints - PK: (UserId,Date) +------------+--------+---------------------+ | UserId | Points | Date | +------------+--------+---------------------+ | 1 | 10 | 2011-03-17 07:16:36 | | 2 | 35 | 2011-03-17 08:09:26 | | 3 | 40 | 2011-03-17 08:05:36 | | 1 | 65 | 2011-03-17 09:01:37 | | 2 | 16 | 2011-03-17 10:12:35 | | 3 | 64 | 2011-03-17 12:51:33 | | 1 | 300 | 2011-03-17 12:19:21 | | 2 | 1200 | 2011-03-17 13:24:13 | | 3 | 510 | 2011-03-17 17:29:32 | +------------+--------+---------------------+
然后我有一个存储过程,它基本上执行 GroupBy UserID 并对点求和。我还可以传递 @StartDate 和 @EndDate 参数来创建特定时间段的排行榜。例如,日/周/月/生命周期中顶级用户的时间窗口。
这似乎适用于中等数量的数据,但当点记录数量超过一百万左右时,速度明显变慢。我正在使用的测试数据是大约 500 个用户在 3 个月的时间内创建的超过 100 万点记录。
有没有不同的方法来解决这个问题?我尝试过通过将点预先分组到小时日期时间桶中来对数据进行非规范化,以减少行数。但我开始认为我需要担心的真正问题是排行榜中需要考虑的用户数量不断增加。时间窗口大小通常很小,但越来越多的用户将开始在任何给定窗口内生成点。
不幸的是,我无法访问“作业”,因为我使用的是 SQL Azure 并且代理尚不可用。但是,如果您足够有说服力,我愿意使用不同的存储系统来扩展它。
我过去的工作经验告诉我我应该研究数据仓库,因为这几乎是一个报告问题。但同时我需要它尽可能实时。
更新
最后,我希望支持每周周一上午 8 点至周五下午 6 点的自定义排行榜。但这是未来的事情,也是为什么我尽量不要对聚合太过着迷。我现在愿意接受基本的日/周/月/年/所有时间窗口。
棘手的部分是我真的无法存储它们非规范化,因为我需要这些窗口是时区可转换的。该系统是多租户的,因此所有数据都以 UTC 形式存储。问题是对于不同的客户来说,一周的开始时间不同。将总和汇总在一起会导致某些点落入错误的桶中。
I'm looking for suggestions for scaling a points leaderboard system. I already have a working version using a very normalized strategy. This first version was essentially a table which looked something like this.
UserPoints - PK: (UserId,Date) +------------+--------+---------------------+ | UserId | Points | Date | +------------+--------+---------------------+ | 1 | 10 | 2011-03-17 07:16:36 | | 2 | 35 | 2011-03-17 08:09:26 | | 3 | 40 | 2011-03-17 08:05:36 | | 1 | 65 | 2011-03-17 09:01:37 | | 2 | 16 | 2011-03-17 10:12:35 | | 3 | 64 | 2011-03-17 12:51:33 | | 1 | 300 | 2011-03-17 12:19:21 | | 2 | 1200 | 2011-03-17 13:24:13 | | 3 | 510 | 2011-03-17 17:29:32 | +------------+--------+---------------------+
I then have a stored procedure which basically does a GroupBy UserID and Sums the Points. I can also pass @StartDate and @EndDate parameters to create a leaderboard for a specific time period. For example, time windows for Top Users for the Day / Week / Month / Lifetime.
This seemed to work well with a moderate amount of data, but things became noticeably slower as the number of points records passed a million or so. The test data I'm working with is just over a million point records created by about 500 users distributed over a timespan of 3 months.
Is there a different way to approach this? I have experimented with denormalizing the data by pre-grouping the points into hour datetime buckets to reduce the number of rows. But I'm starting to think the real problem I need to worry about is the increasing number of users that need to be accounted for in the leaderboard. The time window sizes will generally be small but more and more users will start generating points within any given window.
Unfortunately I don't have access to 'Jobs' since I'm using SQL Azure and the Agent is not available (yet). But, I am open to the idea of scaling this using a different storage system if you are convincing enough.
My past work experience tells me I should look into data warehousing since this is almost a reporting problem. But at the same time I need it to be as real-time as possible.
Update
Ultimately, I would like to support custom leaderboards that could span from Monday 8am - Friday 6pm every week. But that's down the road and why I'm trying to not get too fancy with the aggregation. I'm willing to settle with basic Day/Week/Month/Year/AllTime windows for now.
The tricky part is that I really can't store them denormalized because I need these windows to be TimeZone convertible. The system is mult-tenant and therefore all data is stored as UTC. The problem is a week starts at different hours for different customers. Aggregating the sums together will cause some points to fall into the wrong buckets.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里有一些想法:
here are a few thoughts:
如果这是我的问题,我会忽略时间戳并按天存储用户和积分总数
If it were my problem, I'd ignore the timestamps and store the user and points totals by day
我决定采用存储点以及本地化到客户当前时区设置的时间跨度(StartDate 和 EndDate 列)的想法。我意识到这样做的一个额外好处是,我可以在几个月后“清除”旧的排行榜回合数据,而不会影响一生的总积分。
I decided to go with the idea of storing points along with a timespan (StartDate and EndDate columns) localized to the customer's current TimeZone setting. I realized an extra benefit with this is that I can 'purge' old leaderboard round data after a few monts without affecting the lifetime total of points.