当前位置：文江博客话题详情

查找两个网页之间的最短路径

发布于 2024-08-14 11:01:43 字数 212 浏览 5 评论 0原文

我需要找到两个维基百科页面之间的最短距离（以“跃点”为单位）

我有一种方法可以提取页面上的所有内部维基链接

我知道起始目的地和结束目的地，但我对如何做一无所知从数据中提取跃点

到目前为止，我一直在使用链接提取方法来填充字典，其中键是页面上的链接，值是从中删除的页面。

如果有人有任何想法，一个好的数据结构将是保存信息，然后如何查看它，我将非常感激

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ペ泪落弦音 2024-08-21 11:01:43

您了解图论吗？您拥有构建图表所需的数据，但您需要使用 Dijkstra 算法遍历它以找到两点之间的最短路径。

回复收藏 0 原文

め七分饶幸 2024-08-21 11:01:43

也许这有点愚蠢，因为我并不是真正的 C# 程序员，但包含内部所有链接的多维数组会根据维度的深度让您知道哪种方式包含更少的环。

这只是一个想法，虽然这在理论上肯定是可行的，因为数组可以拥有的维数没有语言限制，但我很确定它会非常占用内存！

像这样的东西：

[source] -> [source link] -> ['source link' link] -> etc
         -> [source link] -> ['source link' link] -> etc
         -> [source link] -> ['source link' link] -> etc
         -> [source link] -> ['source link' link] -> [target]
         -> [source link] -> ['source link' link] -> etc

Maybe it's a bit stupid since I'm not really a C# programmer but a multidimensional array that contains all the links inside would, depending on the depth of dimensions let you know which way contains less hoops.

It's just a thought while this is certainly doable in theory since there is no language limit to the number of dimensions an array can have, I'm pretty sure it'll be really memory hungry!

Something like this:

[source] -> [source link] -> ['source link' link] -> etc
         -> [source link] -> ['source link' link] -> etc
         -> [source link] -> ['source link' link] -> etc
         -> [source link] -> ['source link' link] -> [target]
         -> [source link] -> ['source link' link] -> etc

回复收藏 0 原文

故人的歌 2024-08-21 11:01:43

我认为在这种情况下图表是稀疏的。因此，对每个维基百科页面使用 HashSet 之类的东西可能是个好主意，其中的页面链接到集合内部。

在这种情况下，您实际上不需要实现 Dijikstra 的最短路径算法。因为这等于每条边的权重等于 1 的最短路径问题。您可以执行广度优先搜索并获取找到目标页面的深度。

回复收藏 0 原文

梅窗月明清似水 2024-08-21 11:01:43

下面是 Dijkstra 算法在 python 中的实现： http://code.activestate.com/recipes/119466/

回复收藏 0 原文

夜空下最亮的亮点 2024-08-21 11:01:43

假设您有一个 IEnumerable; PageLinks(Link link)

跳数将通过以下方式解决：

Link curentPage = "somepage";
Link destinationPage = "otherpage";
if (currentPage == destinationPage) return 0;
int hops = 1;
IEnumerable<Link> currentLinks = PageLinks(currentPage);
IEnumerable<Link> visited = new [] {currentPage};
while(!currentLinks.Contains(destinationPage)) 
{
    currentLinks = currentLinks
        .SelectMany(l => PageLinks(l).Where(f => !visited.Contains(f)));
    visited = visited.Union(currentLinks);
    hops++;
}
return hops;

编辑以加快循环速度，尽管算法在没有它的情况下也可以工作。如果页面未链接，它可能会运行到 StackOverflow 左右。

Assuming you have a IEnumerable<Link> PageLinks(Link link)

The number of hops would be solved by the following:

Link curentPage = "somepage";
Link destinationPage = "otherpage";
if (currentPage == destinationPage) return 0;
int hops = 1;
IEnumerable<Link> currentLinks = PageLinks(currentPage);
IEnumerable<Link> visited = new [] {currentPage};
while(!currentLinks.Contains(destinationPage)) 
{
    currentLinks = currentLinks
        .SelectMany(l => PageLinks(l).Where(f => !visited.Contains(f)));
    visited = visited.Union(currentLinks);
    hops++;
}
return hops;

Edited to make faster for cycling, although the algorithm would have worked without it. It may run until StackOverflow or so if the pages aren't linked.

回复收藏 0 原文

~没有更多了~