维基百科的“此处链接是什么”是如何显示的?工作?
我最近使用了维基百科的功能“这里有什么链接”(可以在任何条目的左侧菜单中的“工具箱”元素下找到),它让我开始想知道这个功能实际上是如何工作的。
我猜想搜索链接后的所有文章条目不是很有效,那么所有链接都存储在单独的数据库中吗?如果是这样,这是在编辑文章时还是其他时间更新的?
谢谢。
I recently used Wikipedia's function "What links here" (which is found under the "Toolbox" element in any entry's left menu) and it got me started wondering how this function actually works.
I'm guessing that searching through all the article entries after links isn't very effective, so are all the links stored in a separate database? If so, is this updated when an article is edited or another time?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
每当编辑维基百科上的页面时,它都会被放入后台队列中进行进一步处理。发生的一些事情有:
此类信息不需要当您点击“提交”时立即更新,因此后台处理队列会处理它。有时这个队列会变得相当大,但通常它会受到控制。
您可以在帮助:作业队列中找到更多相关信息。
Whenever a page on Wikipedia is edited, it is placed into a background queue that does some further processing. Some of the things that happen there are:
This sort of information doesn't need to be updated right away when you hit "Submit", so the background processing queue takes care of it. Sometimes this queue can grow quite large, but usually it's kept under control.
You can find more information about this at Help:Job Queue.
您可以认为这是一个更普遍的问题。如果你有一个从 A 到 B 的链接(或指针或其他什么),B 如何知道 A 有一个指向那里的链接?
答案是将信息存储到目标位置。也就是说,当编辑页面A并创建到B的链接时,同时存储有关B的链接源的信息(反向链接)。如果是网页,反向链接可以直接写入“这里有什么链接”页面。只需对静态页面进行一次写入。无需执行任何搜索或数据库查询。
You could think this as a more general problem. If you have a link (or pointer or whatever) from A to B, how can B know that A has a link pointing there?
The answer is to store the information to target location. That is, when the page A is edited and a link is created to B, at the same time store information about the link source to B (a reverse link). In case of a web page, the reverse link could be written directly into "what links here" page. Just a single write into a static page. No need to perform any searches or database queries.
一个简单算法的伪代码可以做到这一点
抱歉,我刚刚完成了我的算法课程,所以我有编写伪代码的冲动。在这种情况下,
updateChanges()
过程将在 Greg Hewgill 提到的“更新其他页面的‘此处链接’”阶段期间被调用。Pseudo code for a simple algorithm that would do it
Sorry I just finished my algorithms class so I have an urge to write pseudo code. In this context, the
updateChanges()
procedure would be something called during the "update the 'what links here' for other pages" phase that Greg Hewgill referred to.我实现的方法是在编辑后获取所有链接,然后将它们存储在一个单独的表中,键为当前 url。然后,我可以使用用户当前所在的 URL 查询表,并获取已标记为链接到该页面的所有链接。
它可能不会那么简单,但这是一般的、简化的想法。也许存储页面 ID 等而不是 URL 会更明智。
The way I would implement is to get all the links after an edit, then store them in a separate table with the key being the current url. Then I could just query the table with the URL the user is currently on and get all the links that have been marked as linking to that page.
It probably wouldn't be as straightforward as that but that's the general, simplified idea. Probably instead of URLs it would be wiser to store page IDs and so on.
文章的“更新事件”触发链接解析器是有意义的,因为这是文章唯一发生更改的时间。更新事件反过来会简单地扫描链接,并在数据库中查询维基百科内部的链接。
我想象每个页面都有一个主键,并创建一个简单的关联表来将页面 PK 与链接到它的所有其他页面相关联。
可能会添加一些额外的位来提高如此大的站点的性能,但这将是基本机制。
It would make sense for the "update event" of an article to trigger a links parser as this is the only time an article is going to change. The update event in turn would simply scan for links, and query the db for links that are internal to wikipedia.
I imagine each page has a primary key and a simple association table is created to relate the pages PK to all the other pages that link to it.
Theres likely some additional bits that get added to aid performance on such a large site but that would be the basic mechanics.