使用 SQL Server 设计可扩展的积分排行榜系统

发布于 2024-10-23 20:17:43 字数 1458 浏览 2 评论 0原文

我正在寻找有关扩展积分排行榜系统的建议。我已经有了一个使用非常标准化策略的工作版本。第一个版本本质上是一个看起来像这样的表格。

UserPoints - PK: (UserId,Date)
+------------+--------+---------------------+  
| UserId     | Points | Date                |  
+------------+--------+---------------------+  
| 1          | 10     | 2011-03-17 07:16:36 |  
| 2          | 35     | 2011-03-17 08:09:26 |  
| 3          | 40     | 2011-03-17 08:05:36 |  
| 1          | 65     | 2011-03-17 09:01:37 |  
| 2          | 16     | 2011-03-17 10:12:35 |  
| 3          | 64     | 2011-03-17 12:51:33 |  
| 1          | 300    | 2011-03-17 12:19:21 |  
| 2          | 1200   | 2011-03-17 13:24:13 |  
| 3          | 510    | 2011-03-17 17:29:32 |  
+------------+--------+---------------------+  

然后我有一个存储过程,它基本上执行 GroupBy UserID 并对点求和。我还可以传递 @StartDate 和 @EndDate 参数来创建特定时间段的排行榜。例如,日/周/月/生命周期中顶级用户的时间窗口。

这似乎适用于中等数量的数据,但当点记录数量超过一百万左右时,速度明显变慢。我正在使用的测试数据是大约 500 个用户在 3 个月的时间内创建的超过 100 万点记录。

有没有不同的方法来解决这个问题?我尝试过通过将点预先分组到小时日期时间桶中来对数据进行非规范化,以减少行数。但我开始认为我需要担心的真正问题是排行榜中需要考虑的用户数量不断增加。时间窗口大小通常很小,但越来越多的用户将开始在任何给定窗口内生成点。

不幸的是,我无法访问“作业”,因为我使用的是 SQL Azure 并且代理尚不可用。但是,如果您足够有说服力,我愿意使用不同的存储系统来扩展它。

我过去的工作经验告诉我我应该研究数据仓库,因为这几乎是一个报告问题。但同时我需要它尽可能实时。

更新

最后,我希望支持每周周一上午 8 点至周五下午 6 点的自定义排行榜。但这是未来的事情,也是为什么我尽量不要对聚合太过着迷。我现在愿意接受基本的日/周/月/年/所有时间窗口。

棘手的部分是我真的无法存储它们非规范化,因为我需要这些窗口是时区可转换的。该系统是多租户的,因此所有数据都以 UTC 形式存储。问题是对于不同的客户来说,一周的开始时间不同。将总和汇总在一起会导致某些点落入错误的桶中。

I'm looking for suggestions for scaling a points leaderboard system. I already have a working version using a very normalized strategy. This first version was essentially a table which looked something like this.

UserPoints - PK: (UserId,Date)
+------------+--------+---------------------+  
| UserId     | Points | Date                |  
+------------+--------+---------------------+  
| 1          | 10     | 2011-03-17 07:16:36 |  
| 2          | 35     | 2011-03-17 08:09:26 |  
| 3          | 40     | 2011-03-17 08:05:36 |  
| 1          | 65     | 2011-03-17 09:01:37 |  
| 2          | 16     | 2011-03-17 10:12:35 |  
| 3          | 64     | 2011-03-17 12:51:33 |  
| 1          | 300    | 2011-03-17 12:19:21 |  
| 2          | 1200   | 2011-03-17 13:24:13 |  
| 3          | 510    | 2011-03-17 17:29:32 |  
+------------+--------+---------------------+  

I then have a stored procedure which basically does a GroupBy UserID and Sums the Points. I can also pass @StartDate and @EndDate parameters to create a leaderboard for a specific time period. For example, time windows for Top Users for the Day / Week / Month / Lifetime.

This seemed to work well with a moderate amount of data, but things became noticeably slower as the number of points records passed a million or so. The test data I'm working with is just over a million point records created by about 500 users distributed over a timespan of 3 months.

Is there a different way to approach this? I have experimented with denormalizing the data by pre-grouping the points into hour datetime buckets to reduce the number of rows. But I'm starting to think the real problem I need to worry about is the increasing number of users that need to be accounted for in the leaderboard. The time window sizes will generally be small but more and more users will start generating points within any given window.

Unfortunately I don't have access to 'Jobs' since I'm using SQL Azure and the Agent is not available (yet). But, I am open to the idea of scaling this using a different storage system if you are convincing enough.

My past work experience tells me I should look into data warehousing since this is almost a reporting problem. But at the same time I need it to be as real-time as possible.

Update

Ultimately, I would like to support custom leaderboards that could span from Monday 8am - Friday 6pm every week. But that's down the road and why I'm trying to not get too fancy with the aggregation. I'm willing to settle with basic Day/Week/Month/Year/AllTime windows for now.

The tricky part is that I really can't store them denormalized because I need these windows to be TimeZone convertible. The system is mult-tenant and therefore all data is stored as UTC. The problem is a week starts at different hours for different customers. Aggregating the sums together will cause some points to fall into the wrong buckets.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

岛徒 2024-10-30 20:17:43

这里有一些想法:

  1. 坚持使用 SQL Azure:您可以有另一个表,PointsTotals。每次向 UserPoints 表中添加一行时,也会增加 PointsTotals 中给定 UserId 的 TotalPoints 值(如果没有要增加的行,则插入新行)。现在,您始终可以计算每个 UserId 的总数。
  2. 使用Azure表存储:创建一个UserPoints表,分区键为userId。这会将用户的所有积分行放在一起,您可以轻松地对它们进行求和。而且...您可以借用建议#1 中的想法,创建一个单独的 PointsTotals 表,其中 PartitionKey 是 UserId,RowKey 可能是总分。

here are a few thoughts:

  1. Sticking with SQL Azure: you can have another table, PointsTotals. Every time you add a row to your UserPoints table, also increment the TotalPoints value for a given UserId in PointsTotals (or insert a new row if they don't have a row to increment). Now you always have totals computed for each UserId.
  2. Going with Azure Table Storage: Create a UserPoints table, with Partition Key being userId. This keeps all of a user's points rows together, where you'd easily be able to sum them. And... you can borrow the idea from suggestion #1, creating a separate PointsTotals table, with PartitionKey being UserId and RowKey probably being the total points.
只是在用心讲痛 2024-10-30 20:17:43

如果这是我的问题,我会忽略时间戳并按天存储用户和积分总数

If it were my problem, I'd ignore the timestamps and store the user and points totals by day

伤感在游骋 2024-10-30 20:17:43

我决定采用存储点以及本地化到客户当前时区设置的时间跨度(StartDate 和 EndDate 列)的想法。我意识到这样做的一个额外好处是,我可以在几个月后“清除”旧的排行榜回合数据,而不会影响一生的总积分。

I decided to go with the idea of storing points along with a timespan (StartDate and EndDate columns) localized to the customer's current TimeZone setting. I realized an extra benefit with this is that I can 'purge' old leaderboard round data after a few monts without affecting the lifetime total of points.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文