如何使用 HtmlAgilityPack 提取完整 url - C#

发布于 2024-12-09 11:46:51 字数 576 浏览 4 评论 0原文

好吧，按照下面的方式，它只提取像这样的引用 url 的

提取代码：

foreach (HtmlNode link in hdDoc.DocumentNode.SelectNodes("//a[@href]"))
{
    lsLinks.Add(link.Attributes["href"].Value.ToString());
}

url 代码

<a href="Login.aspx">Login</a>

提取的 url

Login.aspx

但我想获得浏览器解析的真实链接，就像

http://www.monstermmorpg.com/Login.aspx

我可以通过检查 url 是否包含 http 来完成，如果不包含，则添加域值，但在某些情况下它可能会导致一些问题，我认为这不是一个非常明智的解决方案。

时间：2019-03-07 标签：c#4.0、HtmlAgilityPack.1.4.0

原文

Alright with the way below it is extracting only referring url like this

the extraction code :

foreach (HtmlNode link in hdDoc.DocumentNode.SelectNodes("//a[@href]"))
{
    lsLinks.Add(link.Attributes["href"].Value.ToString());
}

The url code

<a href="Login.aspx">Login</a>

The extracted url

Login.aspx

But i want to get real link what browser parsed like

http://www.monstermmorpg.com/Login.aspx

I can do it with checking the url whether containing http and if not add the domain value but it may cause some problems at some occasions and i think not a very wise solution.

c# 4.0 , HtmlAgilityPack.1.4.0

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅暮の光 2024-12-16 11:46:51

假设您有原始 url，您可以将解析后的 url 组合起来，如下所示：

// The address of the page you crawled
var baseUrl = new Uri("http://example.com/path/to-page/here.aspx");

// root relative
var url = new Uri(baseUrl, "/Login.aspx");
Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/Logon.aspx'

// relative
url = new Uri(baseUrl, "../foo.aspx?q=1");
Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/path/foo.aspx?q=1'

// absolute
url = new Uri(baseUrl, "http://stackoverflow.com/questions/7760286/");
Console.WriteLine (url.AbsoluteUri); // prints 'http://stackoverflow.com/questions/7760286/'

// other...
url = new Uri(baseUrl, "javascript:void(0)");
Console.WriteLine (url.AbsoluteUri); // prints 'javascript:void(0)'

注意使用 AbsoluteUri 而不是依赖于 ToString()，因为 ToString code> 对 URL 进行解码（以使其更加“人类可读”），这通常不是您想要的。

Assuming you have the original url, you can combine the parsed url something like this:

// The address of the page you crawled
var baseUrl = new Uri("http://example.com/path/to-page/here.aspx");

// root relative
var url = new Uri(baseUrl, "/Login.aspx");
Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/Logon.aspx'

// relative
url = new Uri(baseUrl, "../foo.aspx?q=1");
Console.WriteLine (url.AbsoluteUri); // prints 'http://example.com/path/foo.aspx?q=1'

// absolute
url = new Uri(baseUrl, "http://stackoverflow.com/questions/7760286/");
Console.WriteLine (url.AbsoluteUri); // prints 'http://stackoverflow.com/questions/7760286/'

// other...
url = new Uri(baseUrl, "javascript:void(0)");
Console.WriteLine (url.AbsoluteUri); // prints 'javascript:void(0)'

Note the use of AbsoluteUri and not relying on ToString() because ToString decodes the URL (to make it more "human-readable"), which is not typically what you want.

回复收藏 0 原文

痴梦一场 2024-12-16 11:46:51

我可以通过检查 url 是否包含 http 来实现，如果不包含则添加域值

这就是你应该做的。 Html Agility Pack 对此没有任何帮助：

var url = new Uri(
    new Uri(baseUrl).GetLeftPart(UriPartial.Path), 
    link.Attributes["href"].Value)
);

I can do it with checking the url whether containing http and if not add the domain value

That's what you should do. Html Agility Pack has nothing to help you with this:

var url = new Uri(
    new Uri(baseUrl).GetLeftPart(UriPartial.Path), 
    link.Attributes["href"].Value)
);

回复收藏 0 原文

~没有更多了~

关于作者

晌融

暂无简介

文章

26 人气

关注发私信

Promise

文章 0 评论 0

关注

qq_lbRlsh

文章 0 评论 0

关注

待＂谢繁草

文章 0 评论 0

关注

yy2010hell

文章 0 评论 0

关注

漫无边际

文章 0 评论 0

关注

傲娇萝莉攻

文章 0 评论 0

友情链接

文江博客

如何使用 HtmlAgilityPack 提取完整 url - C#

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如何使用 HtmlAgilityPack 提取完整 url - C#

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。