维基百科的“此处链接是什么”是如何显示的?工作?

发布于 2024-08-14 18:49:23 字数 159 浏览 5 评论 0原文

我最近使用了维基百科的功能“这里有什么链接”(可以在任何条目的左侧菜单中的“工具箱”元素下找到),它让我开始想知道这个功能实际上是如何工作的。
我猜想搜索链接后的所有文章条目不是很有效,那么所有链接都存储在单独的数据库中吗?如果是这样,这是在编辑文章时还是其他时间更新的?

谢谢。

I recently used Wikipedia's function "What links here" (which is found under the "Toolbox" element in any entry's left menu) and it got me started wondering how this function actually works.
I'm guessing that searching through all the article entries after links isn't very effective, so are all the links stored in a separate database? If so, is this updated when an article is edited or another time?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

披肩女神 2024-08-21 18:49:23

每当编辑维基百科上的页面时,它都会被放入后台队列中进行进一步处理。发生的一些事情有:

  • 更新其他页面的“此处链接” 更新
  • 类别索引页面
  • 更新现有页面的全局缓存以帮助在其他页面上呈现“红色链接”

此类信息不需要当您点击“提交”时立即更新,因此后台处理队列会处理它。有时这个队列会变得相当大,但通常它会受到控制。

您可以在帮助:作业队列中找到更多相关信息。

Whenever a page on Wikipedia is edited, it is placed into a background queue that does some further processing. Some of the things that happen there are:

  • updates to the "what links here" for other pages
  • updates to category index pages
  • updates to the global cache of existing pages to help render "redlinks" on other pages

This sort of information doesn't need to be updated right away when you hit "Submit", so the background processing queue takes care of it. Sometimes this queue can grow quite large, but usually it's kept under control.

You can find more information about this at Help:Job Queue.

对你而言 2024-08-21 18:49:23

您可以认为这是一个更普遍的问题。如果你有一个从 A 到 B 的链接(或指针或其他什么),B 如何知道 A 有一个指向那里的链接?

答案是将信息存储到目标位置。也就是说,当编辑页面A并创建到B的链接时,同时存储有关B的链接源的信息(反向链接)。如果是网页,反向链接可以直接写入“这里有什么链接”页面。只需对静态页面进行一次写入。无需执行任何搜索或数据库查询。

You could think this as a more general problem. If you have a link (or pointer or whatever) from A to B, how can B know that A has a link pointing there?

The answer is to store the information to target location. That is, when the page A is edited and a link is created to B, at the same time store information about the link source to B (a reverse link). In case of a web page, the reverse link could be written directly into "what links here" page. Just a single write into a static page. No need to perform any searches or database queries.

温柔一刀 2024-08-21 18:49:23

一个简单算法的伪代码可以做到这一点

procedure updateChanges(editedPage):
    for_each(link on editedPage):
        if(link is not to another wikipedia page): continue
        pageToUpdate = open(link):
        if(pageToUpdate->whatLinksHere.contains(editedPage)): continue
        pageToUpdate->whatLinksHere.insert(editedPage)

抱歉,我刚刚完成了我的算法课程,所以我有编写伪代码的冲动。在这种情况下,updateChanges() 过程将在 Greg Hewgill 提到的“更新其他页面的‘此处链接’”阶段期间被调用。

Pseudo code for a simple algorithm that would do it

procedure updateChanges(editedPage):
    for_each(link on editedPage):
        if(link is not to another wikipedia page): continue
        pageToUpdate = open(link):
        if(pageToUpdate->whatLinksHere.contains(editedPage)): continue
        pageToUpdate->whatLinksHere.insert(editedPage)

Sorry I just finished my algorithms class so I have an urge to write pseudo code. In this context, the updateChanges() procedure would be something called during the "update the 'what links here' for other pages" phase that Greg Hewgill referred to.

无法言说的痛 2024-08-21 18:49:23

我实现的方法是在编辑后获取所有链接,然后将它们存储在一个单独的表中,键为当前 url。然后,我可以使用用户当前所在的 URL 查询表,并获取已标记为链接到该页面的所有链接。

它可能不会那么简单,但这是一般的、简化的想法。也许存储页面 ID 等而不是 URL 会更明智。

The way I would implement is to get all the links after an edit, then store them in a separate table with the key being the current url. Then I could just query the table with the URL the user is currently on and get all the links that have been marked as linking to that page.

It probably wouldn't be as straightforward as that but that's the general, simplified idea. Probably instead of URLs it would be wiser to store page IDs and so on.

趁年轻赶紧闹 2024-08-21 18:49:23

文章的“更新事件”触发链接解析器是有意义的,因为这是文章唯一发生更改的时间。更新事件反过来会简单地扫描链接,并在数据库中查询维基百科内部的链接。

我想象每个页面都有一个主键,并创建一个简单的关联表来将页面 PK 与链接到它的所有其他页面相关联。

可能会添加一些额外的位来提高如此大的站点的性能,但这将是基本机制。

It would make sense for the "update event" of an article to trigger a links parser as this is the only time an article is going to change. The update event in turn would simply scan for links, and query the db for links that are internal to wikipedia.

I imagine each page has a primary key and a simple association table is created to relate the pages PK to all the other pages that link to it.

Theres likely some additional bits that get added to aid performance on such a large site but that would be the basic mechanics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文