如何使用 Html Agility Pack 获取 img/src 或 a/hrefs?
我想使用 HTML 敏捷包来解析 HTML 页面中的图像和 href 链接,但我对 XML 或 XPath 不太了解。虽然在许多网站上查找了帮助文档,但我无法解决另外,我在VisualStudio 2005中使用C#,而且我的英语说得不太流利,所以,我要对能写出一些有用代码的人致以诚挚的谢意。
I want to use the HTML agility pack to parse image and href links from a HTML page,but I just don't know much about XML or XPath.Though having looking up help documents in many web sites,I just can't solve the problem.In addition,I use C# in VisualStudio 2005.And I just can't speak English fluently,so,I will give my sincere thanks to the one can write some helpful codes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
主页上的第一个示例的作用非常相似,但请考虑:
所以你可以想象一下,对于 img@src,只需将每个
a
替换为img
,将href
替换为src
。您甚至可以简化为:
对于相对 url 处理,请查看
Uri
类。The first example on the home page does something very similar, but consider:
So you can imagine that for img@src, just replace each
a
withimg
, andhref
withsrc
.You might even be able to simplify to:
For relative url handling, look at the
Uri
class.这个例子和接受的答案是错误的。它无法使用最新版本进行编译。我尝试其他方法:
这对我有用。
The example and the accepted answer is wrong. It doesn't compile with the latest version. I try something else:
This works for me.
也许我来得太晚了,无法发表答案。以下对我有用:
Maybe I am too late here to post an answer. The following worked for me:
来源:
https://html-agility-pack.net/select-nodes
Source:
https://html-agility-pack.net/select-nodes
您还需要考虑文档基本 URL 元素 (
) 和协议相对 URL(例如//www.foo.com/bar/
) 。有关详细信息,请检查:
You also need to take into account the document base URL element (
<base>
) and protocol relative URLs (for example//www.foo.com/bar/
).For more information check:
较晚发布,但这是对已接受答案的 2021 年更新(修复了 HtmlAgilityPack 所做的重构。
Late post, but here's a 2021 update to the accepted answer (fixes the refactoring that HtmlAgilityPack made.