流行度算法 - SQL / Django
我一直在研究 Reddit、Digg 甚至 Stackoverflow 等网站上使用的流行度算法。
Reddit 算法:
t = (time of entry post) - (Dec 8, 2005)
x = upvotes - downvotes
y = {1 if x > 0, 0 if x = 0, -1 if x < 0)
z = {1 if x < 0, otherwise x}
log(z) + (y * t)/45000
我一直在 SQL 中执行简单的排序,我想知道应该如何处理这样的排序。
是否应该使用它来定义表,或者我可以使用公式中的顺序构建 SQL(而不影响性能)?
我还想知道是否可以在不同场合使用多种排序算法,而不会产生性能问题。
我正在使用 Django 和 PostgreSQL。
非常感谢您的帮助! ^^
I've been looking into popularity algorithms used on sites such as Reddit, Digg and even Stackoverflow.
Reddit algorithm:
t = (time of entry post) - (Dec 8, 2005)
x = upvotes - downvotes
y = {1 if x > 0, 0 if x = 0, -1 if x < 0)
z = {1 if x < 0, otherwise x}
log(z) + (y * t)/45000
I have always performed simple ordering within SQL, I'm wondering how I should deal with such ordering.
Should it be used to define a table, or could I build an SQL with the ordering within the formula (without hindering performance)?
I am also wondering, if it is possible to use multiple ordering algorithms in different occasions, without incurring into performance problems.
I'm using Django and PostgreSQL.
Help would be much appreciated! ^^
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该将受欢迎度评级缓存在自己的列中,并在基础值发生变化时更新它。您还应该在该列上设置数据库索引。如果您还缓存了最常见查询的结果,那么您就对流行度查询的性能采取了最有效的措施。
You should cache your popularity rating in an own column and update it when the underlying values change. You should also setup a database index on that column. If you then also cache the result of your most common queries, you took the most effective measures for the performance of your popularity queries.