过期页面排名算法

发布于 2024-12-05 19:40:27 字数 312 浏览 2 评论 0原文

我正在寻找一种算法,可以进行某种页面排名,但随着页面变旧,其价值会降低。

我见过的所有算法都做相反的事情(给旧的领域更多的价值)。

帮助找到这样的算法将不胜感激。

编辑: 看看我最初的问题,我觉得我有点不清楚我在问什么,而且这个问题比我最初想象的要复杂。 基本上我想要的是某种排名算法,如果网站 A 在网站 B 发布帖子后立即链接到网站 B,那么网站 B 的页面会获得额外的页面排名(也许分数是一个更好的词),但如果网站 A 有在帖子发布后很长一段时间内链接到网站 B,它对页面排名的增加很少。

希望这是有道理的。对于最初的问题是错误的表示歉意。

I'm looking for an algorithm that does some sort of page ranking, but gives less value to pages as they get older.

All algorithms I have seen do the opposite (give older domains more value).

Help finding such an algorithm would be much appreciated.

Edit:
Looking at my initial question I think I was a bit unclear as to what I was asking, and the question is more complicated than I originally thought.
Basically what I want is some sort of ranking algorithm that if Site A has linked to Site B immediately after site B has made a post, then site B's page gets extra page rank (maybe score is a better word), but if site A has linked to site B a long time after the post has been made, it adds very little to the page rank.

Hopefully this makes sense. Apologies for the initial question being wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

染柒℉ 2024-12-12 19:40:27

您可以使用偏向页面排名,如 Haveliwala 在文章

这个想法很简单,不使用常规随机分量:[1/n,1/n,....,1/n],而是使用有偏差的随机分量,当您采用随机游走,不是以 1/n 的概率进入每个页面,而是以 f(doc) 的概率进入每个页面,其中 f(doc) 对于较新的页面更高,而 Sigma( f(doc)) = 1 [对于所有文档集合,因此您的随机组件将是 [f(doc1),f(doc2),...,f(docn)]

请注意,对于每个文档,必须是 f(doc )>0,否则无法保证收敛[Perron-Frobenius 定理不适用]。


另一种可能性是计算常规页面排名,并将其与不同的函数 g:Collection->R 相乘,该函数为每个页面提供数值,并且页面越新也就是说,该文档的分数越高。

编辑:

作为对原始问题编辑的回应:

另一种可能性是在生成网络图时,添加附加信息w:E->[0,1],意思是:为每条边添加一个权重函数,表明它的重要性也就是说,如果链接是在原始编辑后不久创建的,则 w(e) 将更接近 1,如果晚得多,则分数将更接近 0。

在创建计算 pagerank 的矩阵时,输入 矩阵[v1][v2] <- w((v1,v2)),而不是指示图中存在边的简单二进制值。

一旦有了这个矩阵,就可以正常计算PageRank。

You can use biased page rank, as described by Haveliwala in this article.

The idea is simple, instead of using a regular random component: [1/n,1/n,....,1/n], use a biased random component, and when you take a random walk, instead of going to each page with probability 1/n, go to each page with probability f(doc), where f(doc) is higher for newer pages, and Sigma(f(doc)) = 1 [for all the docs in the collection, so your random component will be [f(doc1),f(doc2),...,f(docn)]

Note that for each document a must is f(doc)>0, otherwise convergence is not guaranteed [the Perron-Frobenius theorem won't apply].


Another possibility is calculating regular page rank, and multiplying it with a different function g:Collection->R that gives a numerical value to each page, and the newer the page is, the higher the score is for this document.

EDIT:

As response to the original question's edit:

Another possibility is when generating the graph for the web, add additional information w:E->[0,1], meaning: add a weight function for each edge, dentoing how important it is, If the link was made shortly after the original edit, w(e) will be closer to 1, and if it is much later, the score will be closer to 0.

When creating the matrix you calculate pagerank on, put Matrix[v1][v2] <- w((v1,v2)), instead of a simple binary value indicating the edge exists in the graph.

Once you have this matrix, calculate PageRank normally.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文