了解 PageRank 和类似算法背后的数学原理
我查看了本网站上提出的类似问题提供的大量资源,迄今为止最有帮助的资源是在此讨论中找到的,以及此处链接的资源:PageRank 解释。。
虽然这提供了详细的概述,但我正在寻找更具体的内容。虽然我意识到还有其他因素在起作用,并且自算法诞生以来已经进行了多次更改,但从每个链接传递的值的一个很好的指示是:PageRank 除以链接的总页面数。因此,如果一个网站(页面)的 PR 为 8,并链接到 20 个网站,则传递给每个网站的总价值为 8 / 20。至少我是这么认为的。我知道 PageRank 是一个对数范围内 1 - 10 之间的值,这意味着从 PR 1 到 2 的难度明显低于 PR 9 到 10 的难度。这就是我感到困惑的地方 - 如何计算该金额PR 转移到每个链接。我非常简化事情,因为具有大约 10 个出站链接的 PR 10 的页面仍然应该比具有 2 个出站链接的 PR 5 网站传递更多的价值。以简单的水平理解其背后的正确数学的最佳方法是什么?
I've looked at a bunch of resources provided by similar questions asked on this site, the most helpful so far has been found in this discussion, and the resources linked here: PageRank Explained..
While this provides a detailed overview, I'm looking for something a bit more specific. While I realize there are other factors in play, and there have been multiple changes to the algorithm since it's inception, a good indication of the value passed from each link is this: PageRank divided by total pages linked. So if a site (page) has a PR of 8, and links to 20 sites, the amount of total value passed to each site is 8 / 20. Atleast that is what I am led to believe. I know that PageRank is a value between 1 - 10 on a logarithmic scale, meaning that going from a PR 1 to 2 is significantly less difficult than a PR 9 going to a 10. Here's where I am confused - how would one calculate the amount of PR transferred to each link. I'm very much so simplifying things, because a page with a PR 10 with around 10 outbound links should still be passing more value than a PR 5 site with 2 outbound links. What is the best way to understand the proper math behind this at a simple level?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,值得注意的是,当前实施的 PageRank 与论文中的原始想法有很大不同,而且由于它一直在变化,即使该问题中的其他信息也不完全可靠。但我想基本原理是相似的。
我认为 PageRank 在转换为对数尺度之前会被划分,因此如果您的 PageRank 为 P 且 n > 0 个出站链接,传输的 PR 将为(由于衰减因子,略小于)P - log_10 n。因此,如果有 10 个链接,PR 将下降 1,如果有 100 个链接,PR 将下降 2,依此类推。当然,如果n为0,则不会为其他页面赋予PageRank,这只是浪费。
First, it's worth noting that PageRank as currently implemented is far different from the original idea in the paper, and as it changes all the time even the other information in that SO question isn't entirely reliable. But I imagine the fundamentals are similar.
I think the PageRank is divided before conversion to the logarithmic scale, so if you have a PageRank of P and n > 0 outbound links, the PR transferred would be (somewhat less than, because of the decay factor) P - log_10 n. So with 10 links the PR would drop by 1, with 100 links drop by 2, and so on. Of course if n is 0 then no PageRank is given to other pages, it's just wasted.