实施排行榜
用户我的网站为说唱歌词创建注释 (示例)。我想创建一个排行榜来奖励创建最多注释的人。
排行榜应该跟踪每个用户总共创建了多少注释,以及他在过去一周、一天等中创建了多少注释。
实现总体排行榜没有问题:
@users = User.all
<table>
<tr>
<th>Contributor</th>
<th>Annotations</th>
</tr>
<% @users.sort_by{|u| u.annotations.size }.reverse.each do |u| %>
<tr>
<td><%= u %></td>
<td><%= u.annotations.size %></td>
</tr>
<% end %>
</table>
但是当我尝试实现(比如说)每日注释时,记分板,我正在重复代码,并且操作非常慢(因为它必须迭代内存中的每个注释,而不是依赖数据库排序/计数):
<table>
<tr>
<th>Contributor</th>
<th>Annotations</th>
</tr>
<% @users.sort_by{|u| u.annotations.select{|a| a.created_at > 1.day.ago }.size }.reverse.each do |u| %>
<tr>
<td><%= u %></td>
<td><%= u.annotations.select{|a| a.created_at > 1.day.ago }.size %></td>
</tr>
<% end %>
</table>
实现每日/每周记分板的最佳方法是什么?
Users on my site create annotations for rap lyrics (example). I want to create a leaderboard to reward the people who create the most annotations.
The leaderboard should track how many annotations each user has created overall, as well as how many he has created in the past week, day, etc.
I have no problem implementing the overall leaderboard:
@users = User.all
<table>
<tr>
<th>Contributor</th>
<th>Annotations</th>
</tr>
<% @users.sort_by{|u| u.annotations.size }.reverse.each do |u| %>
<tr>
<td><%= u %></td>
<td><%= u.annotations.size %></td>
</tr>
<% end %>
</table>
But when I try to implement (say) the daily scoreboard, I'm repeating code and the operation is painfully slow (because it has to iterate through every annotation in memory rather than relying on database sorting / counting):
<table>
<tr>
<th>Contributor</th>
<th>Annotations</th>
</tr>
<% @users.sort_by{|u| u.annotations.select{|a| a.created_at > 1.day.ago }.size }.reverse.each do |u| %>
<tr>
<td><%= u %></td>
<td><%= u.annotations.select{|a| a.created_at > 1.day.ago }.size %></td>
</tr>
<% end %>
</table>
What's the best way to implement a daily / weekly scoreboard?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
排行榜作为一个整体,实施起来很痛苦。嗯,根据我的经验,实际的实施相当简单,只是它们很难扩展。通常,您会发现自己必须运行许多数据库密集型的数据库查询。要处理每日/每周报告,人们可能会查询日期时间列,但这意味着您在该列上有一个索引。该索引实际上仅对排行榜查询有用,并且它使该表上的所有其他写入付出代价,因为必须重新计算索引。
另一种方法是按计划的时间间隔生成统计信息,并将该数据写入单独的表中,供排行榜查询使用。例如,您有一个每天晚上运行查询的后台作业(可能是一个昂贵的作业,因为它不使用日期时间索引,但由于它只运行一次并且通过后台作业,费用是“可以”),该查询依次写入统计表,该统计表在日期时间列上有索引,然后重写排行榜页面以达到预先计算的统计数据。根据您的需要,您可能会让 cron 脚本还执行其他数据处理和预计算,因此排行榜页面必须执行尽可能少的计算。
此时,您的排行榜页面已经开始工作,当它访问带有索引的表时,它仍然需要读取大量行。这是假设您有良好的流量。让索引查询在每个页面上命中大量行仍然是昂贵的。所以现在你考虑实现页面缓存,也许将数据存储在memcached中。也就是说,由于根据定义,每日排行榜数据每天的变化最少,因此在每个页面视图上重新运行这些数据库查询的成本很高。将日常数据缓存在 memcached 中更有意义,并且每个页面视图仅访问 memcached。
正如你所看到的,这是一个进化的过程。如果您的流量较低,您可能无需使用单独的表,而只需在日期时间列上建立索引。运行总和、计数和平均值可能没问题。但它无法扩展。因此,您必须考虑将其分解为更优化的结构。然后您会发现,每天一遍又一遍地运行相同的查询,而基础数据在 24 小时内不会发生变化,成本高昂,因此您转向缓存设置。有很多移动部件,它可能会变得复杂,嗯,真的很快就乏味了。
当谈到排行榜时,我是一个久经沙场的愤世嫉俗者,虽然它们非常适合游戏机制和激励人们(每个人都喜欢看到分数!),但大规模制作工作却很痛苦。
Leaderboards as a whole are a pain to implement. Well, in my experience, the actual implementation is fairly straight-forward its just that they get hard to scale. Often you find yourself having to run many DB queries which are pretty DB intensive. To handle daily/weekly reports one would likely query on a datetime column but then it implies that you have an index on said column. That index is really only useful for the leaderboard queries and it makes all other writes on that table pay the price because the index has to be recalculated.
Another approach is to generate your statistics on a scheduled interval and you write that data to a separate table, which is used by the leaderboard queries. For example, you have a background job that runs every night you run a query (maybe its an expensive one because it doesnt use the datetime index but since its run only once and via a background job the expense is "ok"), that query in turn writes to a statistics table that does have an index on the datetime column, then you rewrite your leaderboard page to hit your pre-calculated statistics. Depending on your needs you might have that cron script also do other data munging and pre-calculation so the leaderboard page has to do as little as computation as necessary.
At this point you have your leaderboard page working and while it hits a table with an index it will still be having to read a large number of rows. This is assuming you have decent traffic. Having an indexed query hit a large number of rows on every page is still expensive. So now you think about implementing page caching, maybe storing data in memcached. That is, since the daily leaderboard data changes at minimum every day, by definition, its expensive to re-run those DB queries on every page view. It makes more sense to cache that daily data in memcached and each page view only hits memcached.
So as you can see its an evolved process. If your traffic is low than you might get away with having no separate table and just having an index on your datetime column. Running sums, counts and averages might be OK. But it doesnt scale. So you then have to think about breaking it out into a more optimized structure. And then you see that running the same query over and over every day while the underlying data doesnt change in 24 hours is expensive so you move to a caching setup. There are lots of moving parts and it can get complicated, well, really just tedious fast.
I am a battle-hardended cynic when it comes to leaderboards and while they are great for game mechanics and motivating people (everyone loves to see a score!) its a pain in the ass to make work at a large scale.
您是否考虑过将这些统计数据保存在一个单独的表/模型中,并由观察者更新?您在此处的视图中执行了大量繁重的工作,这通常不是一个好的做法。
Have you considered keeping these statistics in a separate table/model that gets updated by an observer? You're doing a lot of heavy lifting in a view here, which isn't generally good practice.
我建议使用 Redis。您可以运行一个 cron 类型的任务,从数据库中提取数据,然后将其放入 Redis 排序集中。排序集功能可能是存储排行榜的最佳实用程序。
http://redis.io/topics/data-types
I would suggest using Redis. You could have a cron-type task run that pulls the data down from your DB, then puts it into a Redis sorted set. The sorted set feature is probably the best utility out there to store leaderboards.
http://redis.io/topics/data-types
除了 Jeff 关于使用 Redis 的建议之外,这里还有一个我一直在排行榜工作中使用的 ruby gem,它依赖于 Redis: https://github.com/agoragames/leaderboard
In addition to Jeff's suggestion on using redis, here's a ruby gem I've been using for my leaderboard work that piggy-backs off of redis: https://github.com/agoragames/leaderboard