用 SQL 实现 Hacker News 排名算法

发布于 2024-09-24 13:49:35 字数 1154 浏览 1 评论 0原文

以下是 Paul Graham 描述 Hacker News 排名算法的方式

News.YC 的只是

(p - 1) / (t + 2)^1.5

其中 p = 分数,t = 年龄(以小时为单位)

我想在纯 mySQL 中执行此操作,给出下表:

  • 带有字段 postID (index ) 和 postTime (时间戳)。
  • 表投票,包含字段 voteID(索引)、postID 和 vote(整数,0 或 1)。

投票字段的想法是投票可以撤销。 出于排名的目的,vote=0 相当于根本没有投票。 (所有投票都是赞成票,没有反对票之类的东西。)

问题是如何构建一个查询,返回前 N 个 postID,并按 Paul Graham 的公式排序。 总共大约有 10 万篇帖子,因此如果您认为需要缓存分数或任何其他内容,我很乐意听到相关建议。

(显然这不是火箭科学,我当然可以弄清楚,但我认为那些早餐、午餐和晚餐都吃 SQL 的人可能会滔滔不绝地讲出来。而且在 StackOverflow 上提供它似乎很有价值。)


相关问题:

Here's how Paul Graham describes the ranking algorithm for Hacker News:

News.YC's is just

(p - 1) / (t + 2)^1.5

where p = points and t = age in hours

I'd like to do that in pure mySQL given the following tables:

  • Table Posts with fields postID (index) and postTime (timestamp).
  • Table Votes with fields voteID (index), postID, and vote (integer, 0 or 1).

The idea of the vote field is that votes can be rescinded.
For the purposes of the ranking, vote=0 is equivalent to no vote at all.
(All votes are upvotes, no such thing as downvotes.)

The question is how to construct a query that returns the top N postIDs, sorted by Paul Graham's formula.
There are approximately 100k posts altogether so if you think caching of the scores or anything will be needed, I'd love to hear advice about that.

(Obviously this is not rocket science and I can certainly figure it out but I figured someone who eats SQL for breakfast, lunch, and dinner could just rattle it off. And it seems valuable to have available on StackOverflow.)


Related questions:

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

泅渡 2024-10-01 13:49:35

未经测试:

  SELECT x.*
    FROM POSTS x
    JOIN (SELECT p.postid, 
                 SUM(v.vote) AS points
            FROM POSTS p
            JOIN VOTES v ON v.postid = p.postid
        GROUP BY p.postid) y ON y.postid = x.postid
ORDER BY (y.points - 1)/POW(((UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(x.timestamp))/3600)+2, 1.5) DESC
   LIMIT n

Untested:

  SELECT x.*
    FROM POSTS x
    JOIN (SELECT p.postid, 
                 SUM(v.vote) AS points
            FROM POSTS p
            JOIN VOTES v ON v.postid = p.postid
        GROUP BY p.postid) y ON y.postid = x.postid
ORDER BY (y.points - 1)/POW(((UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(x.timestamp))/3600)+2, 1.5) DESC
   LIMIT n
碍人泪离人颜 2024-10-01 13:49:35
$sql=mysql_query("SELECT * FROM news 
                         ORDER BY ((noOfLike-1)/POW(((UNIX_TIMESTAMP(NOW()) - 
                         UNIX_TIMESTAMP(created_at))/3600)+2,1.5)) DESC 
                 LIMIT 20");

这段代码适合我制作一个像 HN 这样的主页。

news:是表名。

noOfLike:喜欢此新闻的用户总数。

created_at:发布该新闻时的时间戳

$sql=mysql_query("SELECT * FROM news 
                         ORDER BY ((noOfLike-1)/POW(((UNIX_TIMESTAMP(NOW()) - 
                         UNIX_TIMESTAMP(created_at))/3600)+2,1.5)) DESC 
                 LIMIT 20");

This code works for me to make a home page like HN.

news: is the table name.

noOfLike: Total # of user like this news.

created_at: TimeStamp that when that news posted

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文