使用 XPATH 通过 HTML Agility Pack 获取元标记属性
META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" /> TITLE>Microsoft Corporation META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" /> META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" /> META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." /> META NAME="MS.LOCALE" CONTENT="EN-US" /> META NAME="CATEGORY" CONTENT="home page" />
我想知道使用 HTML Agility Pack 获取类别元标记的 Content 属性的值需要什么 XPATH。 (我删除了 html 代码中每行的第一个 <,以便它可以发布)。
META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" /> TITLE>Microsoft Corporation META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" /> META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" /> META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." /> META NAME="MS.LOCALE" CONTENT="EN-US" /> META NAME="CATEGORY" CONTENT="home page" />
I'd like to know what XPATH I would need to get the value of the Content attribute of the Category meta tag using HTML Agility Pack. (I removed the first < of each line in the html code so it would post).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
很长一段时间HtmlAgilityPack 不具备直接查询某个属性值的能力< /a>.您必须循环元节点列表。这是一种方法:
但看起来有一个 实验性 xpath 版本 可以让你这样做。
将返回 HtmlAttribute 对象的列表。
For a long time HtmlAgilityPack didn't have the ability to directly query an attribute value. You had to loop over the list of meta nodes. Here's one way:
But it looks like there is an experimental xpath release that will let you do that.
will return a list of HtmlAttribute objects.
感谢 Rohit Agarwal 的快速回复(我在问了几个小时后就看到它得到了答复,但直到今天才能够测试)。
我最初确实按照如下方式实现了您的建议(位于 vb.net 中)
Dim 结果 As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(result)
但是,我发现 //meta[@name='title'] 会给我相同的结果
Dim result As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(结果)
title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)
感谢您让我走上正确的道路=D
Thank you for the quick response Rohit Agarwal (I saw it answered only a few hours after I asked, but haven't been able to test it until today).
I did originally implement your suggestion as follows (it's in vb.net)
Dim result As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(result)
However, I've found that //meta[@name='title'] will give me the same result
Dim result As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(result)
title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)
Thanks for putting me on the right track=D
如果您只想让元标记显示标题、描述和关键字,请使用
如果您想从链接获取 og:tags,然后添加此代码,
这是很棒的体验...我喜欢: )这段代码曾经
If you just want the meta tag to display Title, description and keywords then use
If you want to get the
og:tags
from the link add this code after thatthis is great experience... I like :) this code ever
没有错误检查:
当然,如果节点为空,或者内容属性不存在,这将产生问题。
With no Error Checking:
Of course if the Node is Null, or if the Content Attribute is not present this will create an issue.