当前位置：文江博客话题详情

例如，Reddit 排名的数学算法从何而来？

发布于 2024-11-18 01:54:39 字数 329 浏览 9 评论 0原文

最近，我正在研究 Reddit 的算法，用于确定什么使帖子成为“热门”主题以及哪些内容适合 Reddit 主页。

我正在读的文章在这里： http://amix.dk/blog/post/19588

我注意到他们有数学对数并创建了某种确定帖子的热门度/相关性的数学函数。

在使用的公式中，每个数学成分来自哪里以及它们如何知道使用它们？

谢谢你！

-- Bakz

编辑：只是为了澄清一下，我刚刚高中毕业，如果这个问题的答案看起来很明显，我深表歉意。再次感谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

稳稳的幸福 2024-11-25 01:54:39

我将解决第一个公式，即帖子的“热度”。像这样的公式来自需求。 Reddit 的设计者思考了他们想要实现的目标，并相应地设计了公式。我无法确切地告诉您他们的想法是什么，但我可以查看实施情况并猜测他们想要一个遵循以下原则的系统：

除非票数发生变化，否则不需要重新计算分数。这减少了对数据库的更改次数，并且在复制数据时更容易实现一致性。（因此，任何基于分数随着文章老化而降低的评分系统都是不好的）。
如果两个故事的年龄相同，则获得更多支持的故事应该更高。（因此需要有投票的贡献。）
一个故事获得的赞成票越多，它保持在排名靠前的时间就越长。
老故事不应该永远保持在排名的首位，即使它们有很多点赞。很快（一两天后），新故事的排名就会超过它们。（因此，需要从日期开始做出贡献，并且无论获得多少票，这都必须很快超过由于投票而产生的分数。）
反对票多于赞成票的故事根本不应该出现在排名中.

现在让我们看一下公式：log z + yt / 45000，看看它如何满足这些要求。

如果票数没有变化，则z、y和t都不变。所以分数不变。这满足要求(1)。
如果两个故事的年龄相同，那么它们的 t 值相同。但点赞数越多的 z 值就越高，并且由于 log 是单调的，因此它的得分也越高。这满足要求 (2)。
一个故事获得的支持越多，它的 z 就越高，因此另一个具有更高 t 的故事超越它的时间就越长。这满足要求 (3)。
对数是一个随着它变大而增长得更慢的函数 (看看它的图表）。因此，随着时间的推移，一个故事需要越来越多的赞成票才能跟上新的故事。这满足要求 (4)。
如果故事的反对票数多于赞成票数，则 z = 1 且 y = −1，因此得分为负数。这满足要求 (5)。

常数 45,000 是一个使点赞数和年龄达到平衡的比例因子。一天有 86,400 秒，因此t每天都会增加这个量。 t 除以 45,000 得到 1.92，这意味着一天的相对新鲜度价值为 10^1.92 = 83 票，两天的相对新鲜度价值约为 7,000 票投票。

I'll tackle the first formula, for "hotness" of posts. Formulas like this come from requirements. The designers of Reddit have thought about what they want to achieve, and designed formulas accordingly. I can't tell you exactly what requirements they had in mind, but I can look at the implementation and guess that they wanted a system along these lines:

Scores shouldn't need to be recomputed unless the number of votes change. This reduces the number of changes to the database, and makes it easier to achieve consistency if data is replicated. (So any scoring system based on scores getting lower as the article ages will be no good).
If two stories are equally old, the one with more upvotes should be higher. (So there needs to be a contribution from the votes.)
The more upvotes a story gets, the longer it should remain near the top of the ranking.
Old stories shouldn't stay at the top of the rankings for ever, even if they had lots of upvotes. Fairly soon (after a day or two), new stories need to outrank them. (So there needs to be a contribution from the date, and this must outweigh the score due to votes fairly soon, no matter how many votes something gets.)
Stories with more downvotes than upvotes should not appear in the rankings at all.

Now let's look at the formula: log z + yt / 45000 and see how it satisfies these requirements.

If the number of votes does not change, then z, y and t are all unchanged. So the score is unchanged. This satisfies requirement (1).
If two stories have the same age, then they have the same value for t. But the one with more upvotes has a higher value of z, and since log is monotonic, it has a higher score. This satisfies requirement (2).
The more upvotes a story has, the higher its z, so the longer it will be until another story with higher t can outrank it. This satisfies requirement (3).
Logarithm is a function that grows more slowly as it gets larger (take a look at its graph). So a story needs more and more upvotes over time to keep up with newer stories. This satisfies requirement (4).
If the story has more downvotes than upvotes, then z = 1 and y = −1 so the score is negative. This satisfies requirement (5).

The constant 45,000 is a scale factor that brings the upvotes and the age into balance. There are 86,400 seconds in a day, so t gets larger by this amount each day. Dividing t by 45,000 gives 1.92 which means that one day's relative newness is worth is 10^1.92 = 83 votes, and two days' relative newness are worth roughly 7,000 votes.

回复收藏 0 原文