网站评级的算法/技术(PageRank 除外)

发布于 2024-12-10 14:17:28 字数 434 浏览 2 评论 0原文

我正在寻找能够呈现单个网页重要性的算法/技术。抛开 PageRank 不谈,还有其他方法可以根据内容、结构和相互之间的超链接进行这种评级吗?

我不仅像 PageRank 那样谈论从 www.foo.com 到 www.bar.com 的连接,而且还讨论从 www.foo.com/bar 到 www.foo.com/baz 等等(除了以下事实)适应 PageRank 以满足这些需求)

我如何“定义”重要性:我认为在这种情况下的重要性是“这一面与用户的相关程度,以及它对其他部分的重要性”网站”。
例如,在首页上宣布圣诞节抽奖活动,只有一个链接指向该网站,这对于用户和网站来说都更重要。具有来自每个站点的链接(因为它主要位于页脚中的某个位置)的印记并不重要,尽管它有很多链接。作为一个“单位”,印记对于网站来说也不重要,因为它不会为页面的目的提供任何实际价值(=提供信息、销售产品、一般服务等)

I'm looking for algorithms/techniques that are able to present the importance of a a single webpage. Leaving PageRank aside, are there any other methods of doing such a rating based on content, structure and hyperlinks with each other?

I'm not only talking about the connection from www.foo.com to www.bar.com as PageRank does but also from www.foo.com/bar to www.foo.com/baz and so on (beside the fact of adapting PageRank for these needs)

How do I "define" importance: I think of importance in this context as "how relevant is this side to the user, as well as how important it is to the rest of the site".
E.g. A christmas raffle is announced on the startpage with only a single link leading to this site is more important to the user as well as to the site. An imprint, which has a link from every site (since it's mostly somewhere in the footer) is not important although it has many links to it. Imprint is also not important to the site as a "unit" since it doesn't give any real value for the page's puprpose (= giving information, selling products, a general service, etc)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

土豪我们做朋友吧 2024-12-17 14:17:28

还有 SALSA 比 HITS 更稳定[因此受到的影响较小来自垃圾邮件]。

由于您也对页面上下文感兴趣,因此您可能想看看 Haveliwala 在 主题敏感页面排名

There is also SALSA which is more stable then HITS [so it suffers less from spam].

Since you are also interested in context of pages, you might want to have a look on Haveliwala's work on topic sensitive page rank

忆伤 2024-12-17 14:17:28

另一个著名的算法是中心和权威机构 (HITS)。基本上,您可以将页面分类为中心(具有大量出站链接的页面)和权威(具有大量入站链接的页面)。

但你应该真正定义重要性的含义。真正重要是什么意思? PageRank 根据入站链接定义它。这就是 PageRank 定义。

如果您将拥有照片定义为“重要”,因为您喜欢摄影。然后,您可以提出一个重要指标,例如页面中的照片数量。另一个指标可能是来自摄影网站的入站链接数量(例如 flickr.com500px...)

使用您的定义重要,您可以使用“1-(入站链接数量除以网站页面数量)。这将为您提供 0 到 1 之间的数字。0 表示不重要,1 表示重要。

使用这一指标,您的印记(出现在网站所有页面上)的重要性为 0。您的圣诞促销页面(只有一个链接)的重要性几乎为 1

Another famous algorithm is the Hubs and Authorities (HITS). Basically you classify your page as either a Hub (a page having a lot of outbound links) and Authorities (a page having a lot of inbound links).

But you should really define what you mean by importance. What does really important mean ? PageRank defines it with respect to the inbound links. That is PageRank definitions.

If you define important as having a photo, because you like photography. Then you could come up with an important metric like number of photos in the page. Another metric could be the number of inbound links from a photography site (like flickr.com, 500px, ...)

Using your definition of important, you could use `1-(the number of inbound links divided by the number of pages on the site). This gives you a number between 0 and 1. 0 means not important and 1 means important.

Using this metric your imprint, which appears on all the pages of the site, has importance of 0. Your Christmas sale page, which has only one link to it, has importance almost 1

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文