类似Digg的热门内容轮播主页,如何将日期作为因素包含在内?
我正在构建一个高级图像共享网络应用程序。正如您所期望的,用户可以上传图像,其他人可以对其进行评论、投票和收藏。这些事件将决定我在“业力”场中捕捉到的图像的受欢迎程度。
现在我想创建一个类似Digg的主页系统,显示最流行的图像。这很容易,因为我已经有了加权的 Karma 分数。我只是按降序排序以显示 20 张最有价值的图像。
缺少的部分是时间。我不希望非常受欢迎的图像始终出现在主页上。我想一个简单的解决方案是将结果集限制为过去 24 小时。然而,我还认为,为了保持图像旋转全天发生,时间可以是某种变量,其偏移量会影响图像的排序。
具体问题:
- 您会推荐简单的场景(仅在 24 小时内排序最佳图像)还是更复杂的场景(使用日期时间偏移作为排序的一部分)?如果你建议后者,对这个问题的数学解决方案有什么帮助吗?
- 最好运行预定的服务来标记主页的图像,还是建议直接查询(我正在使用MySQL)
- 作为额外说明,主页应该支持分页,并且在安静的一天应该包括天数条目之前为了确保它始终“填充”,
我并没有要求社区构建这个算法,只是寻求一些建议:)
I am building an advanced image sharing web application. As you may expect, users can upload images and others can comments on it, vote on it, and favorite it. These events will determine the popularity of the image, which I capture in a "karma" field.
Now I want to create a Digg-like homepage system, showing the most popular images. It's easy, since I already have the weighted Karma score. I just sort on that descendingly to show the 20 most valued images.
The part that is missing is time. I do not want extremely popular images to always be on the homepage. I guess an easy solution is to restrict the result set to the last 24 hours. However, I'm also thinking that in order to keep the image rotation occur throughout the day, time can be some kind of variable where its offset has an influence on the image's sorting.
Specific questions:
- Would you recommend the easy scenario (just sort for best images within 24 hours) or the more sophisticated one (use datetime offset as part of the sorting)? If you advise the latter, any help on the mathematical solution to this?
- Would it be best to run a scheduled service to mark images for the homepage, or would you advise a direct query (I'm using MySQL)
- As an extra note, the homepage should support paging and on a quiet day should include entries of days before in order to make sure it is always "filled"
I'm not asking the community to build this algorithm, just looking for some advise :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我会选择一个函数,在给定的时间过后,减少每个项目的“有效业力”。这有点像Eric的方法。
确定您希望“有效业力”减少的频率。然后将业力乘以基于该周期的比例因子。
其中
percentage_decrease
由您的函数确定。例如,您可以使每个项目的有效业力在 24 小时内降至 0。然后使用有效业力来确定要显示哪些图像。这比仅仅减去发布后的时间更稳定,因为它将 karma 缩放到 0 和实际值之间。最小值是将缩放比例保持在 0 下限,因为一旦一天过去,您将开始获得大于 1 的值。
但是,这并没有考虑严格意义上的流行度。蒂姆的回答给出了一些关于如何考虑严格流行度(即页面浏览量)的想法。
I would go with a function that decreases the "effective karma" of each item after a given amount of time elapses. This is a bit like Eric's method.
Determine how often you want the "effective karma" to be decreased. Then multiply the karma by a scaling factor based on this period.
where
percentage_decrease
is determined by yourfunction. For instance, you could doto make it so the effective karma of each item decreases to 0 over 24 hours. Then use the effective karma to determine what images to show. This is a bit more of a stable solution than just subtracting the time since posting, as it scales the karma between 0 and its actual value. The min is to keep the scaling at a 0 lower bound, as once a day passes, you'll start getting values greater than 1.
However, this doesn't take into account popularity in the strict sense. Tim's answer gives some ideas into how to take strict popularity (i.e. page views) into account.
对于你的第一个问题,我会采用稍微复杂的方法。您会想要一些“一直以来的最爱”。但不要只看时间,而要看图像的实际浏览次数。请记住,并不是每个人都会登录并投票,但这并不会让该图像变得不那么受欢迎。对于人们来说,两年前拥有 10 票和 10 万次观看的图像显然比一张一年前拥有 100 票和 1000 次观看的图像更重要。
对于第二个问题,是的,您希望在首页进行某种缓存。要生成站点的入口点需要进行大量查询。然而,就像这样,您的网站类型往往会通过搜索引擎将流量吸引到内页..因此请尝试在各处观察/优化您的查询。
对于第三个问题,考虑时间以外的因素(即视图数量)有助于确保您始终拥有完整且动态的页面。我不确定在首页上分页,引导人们进行标签或搜索可能是更好的策略。
For your first question, I would go with the slightly more complicated method. You will want some "All time favorites" in the mix. But don't go by time alone, go by the number of actual views the image has. Keep in mind that not everyone is going to login and vote, but that doesn't make the image any less popular. An image that is two years old with 10 votes and 100k views is obviously more important to people than an image that is 1 year old with 100 votes and 1k views.
For your second question, yes, you want some kind of caching going on in your front page. That's a lot of queries to produce the entry point into your site. However, much like SO, your type of site will tend to draw traffic to inner pages through search engines .. so try and watch / optimize your queries everywhere.
For your third question, going by factors other than time (i.e. # of views) helps to make sure you always have a full and dynamic page. I'm not sure about paginating on the front page, leading people to tags or searches might be a better strategy.
您可以只计算一个“调整后的业力”类型字段,该字段会考虑时间:
然后您可以直接在查询中计算和排序,或者您可以将其设置为数据库中的实际字段,通过夜间过程更新或其他什么。就我个人而言,我会选择每晚更新它的过程,因为这可能会使将来更容易使算法变得更加复杂。
You could just calculate an "adjusted karma" type field that would take the time into account:
You could then calculate and sort by that directly in your query, or you could make it an actual field in the database that you update via a nightly process or something. Personally I would go with a nightly process that updates it since that will probably make it easier to make the algorithm a bit more sophisticated in the future.
我找到了这个,
伯努利参数的威尔逊得分置信区间下限
看看这个:http://www.derivante.com/2009/09/01/php-content- rating-confidence/
在第二个示例中他解释了如何利用时间作为“新鲜度因素”。
This, i've found it, the
Lower bound of Wilson score confidence interval for a Bernoulli parameter
Look at this: http://www.derivante.com/2009/09/01/php-content-rating-confidence/
At the second example he explains how to use time as a "freshness factor".