使用 XPATH 通过 HTML Agility Pack 获取元标记属性

发布于 2024-09-08 19:52:36 字数 735 浏览 8 评论 0原文

META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
TITLE>Microsoft Corporation
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" />
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" />
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." />
META NAME="MS.LOCALE" CONTENT="EN-US" />
META NAME="CATEGORY" CONTENT="home page" />

我想知道使用 HTML Agility Pack 获取类别元标记的 Content 属性的值需要什么 XPATH。（我删除了 html 代码中每行的第一个 <，以便它可以发布）。

原文

META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
TITLE>Microsoft Corporation
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" />
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" />
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." />
META NAME="MS.LOCALE" CONTENT="EN-US" />
META NAME="CATEGORY" CONTENT="home page" />

I'd like to know what XPATH I would need to get the value of the Content attribute of the Category meta tag using HTML Agility Pack. (I removed the first < of each line in the html code so it would post).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吲‖鸣 2024-09-15 19:52:36

很长一段时间HtmlAgilityPack 不具备直接查询某个属性值的能力< /a>.您必须循环元节点列表。这是一种方法：

var doc = new HtmlDocument();
doc.LoadHtml(htmlString);

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list)
{
    string content = node.GetAttributeValue("content", "");
}

但看起来有一个实验性 xpath 版本可以让你这样做。

doc.DocumentNode.SelectNodes("//meta/@content")

将返回 HtmlAttribute 对象的列表。

For a long time HtmlAgilityPack didn't have the ability to directly query an attribute value. You had to loop over the list of meta nodes. Here's one way:

var doc = new HtmlDocument();
doc.LoadHtml(htmlString);

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list)
{
    string content = node.GetAttributeValue("content", "");
}

But it looks like there is an experimental xpath release that will let you do that.

doc.DocumentNode.SelectNodes("//meta/@content")

will return a list of HtmlAttribute objects.

回复收藏 0 原文

八巷 2024-09-15 19:52:36

感谢 Rohit Agarwal 的快速回复（我在问了几个小时后就看到它得到了答复，但直到今天才能够测试）。

我最初确实按照如下方式实现了您的建议（位于 vb.net 中）

Dim 结果 As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)

    Dim list = doc.DocumentNode.SelectNodes("//meta")
    Dim node As Object

    For Each node In list
        Dim metaname As String = node.GetAttributeValue("name", String.Empty)
        If metaname <> String.Empty Then
            If (metaname = "title") Then
                title = node.GetAttributeValue("content", String.Empty)
            //more elseif thens
            End if
        End if
    Next (node)

但是，我发现 //meta[@name='title'] 会给我相同的结果

Dim result As String = webClient.DownloadString(url)

Dim doc As New HtmlDocument()
doc.LoadHtml(结果)

title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)

感谢您让我走上正确的道路=D

Thank you for the quick response Rohit Agarwal (I saw it answered only a few hours after I asked, but haven't been able to test it until today).

I did originally implement your suggestion as follows (it's in vb.net)

Dim result As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)

    Dim list = doc.DocumentNode.SelectNodes("//meta")
    Dim node As Object

    For Each node In list
        Dim metaname As String = node.GetAttributeValue("name", String.Empty)
        If metaname <> String.Empty Then
            If (metaname = "title") Then
                title = node.GetAttributeValue("content", String.Empty)
            //more elseif thens
            End if
        End if
    Next (node)

However, I've found that //meta[@name='title'] will give me the same result

Dim result As String = webClient.DownloadString(url)

Dim doc As New HtmlDocument()
doc.LoadHtml(result)

title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)

Thanks for putting me on the right track=D

回复收藏 0 原文

终遇你 2024-09-15 19:52:36

如果您只想让元标记显示标题、描述和关键字，请使用

 if (metaTags != null)
        {
            foreach (var tag in metaTags)
            {
                if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
                {
                       Panel divPage = new Panel();                        
                       divPage.InnerHtml = divPage.InnerHtml + "<br /> " +
                        "<b> Page " + tag.Attributes["name"].Value + " </b>: " +
                            tag.Attributes["content"].Value + "<br />";
                }
            }
        }

如果您想从链接获取 og:tags，然后添加此代码，

            if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null))
            {
                if (tag.Attributes["property"].Value == "og:image")
                {
                    img.ImageUrl = tag.Attributes["content"].Value;
                }

            }

这是很棒的体验...我喜欢：）这段代码曾经

If you just want the meta tag to display Title, description and keywords then use

 if (metaTags != null)
        {
            foreach (var tag in metaTags)
            {
                if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
                {
                       Panel divPage = new Panel();                        
                       divPage.InnerHtml = divPage.InnerHtml + "<br /> " +
                        "<b> Page " + tag.Attributes["name"].Value + " </b>: " +
                            tag.Attributes["content"].Value + "<br />";
                }
            }
        }

If you want to get the og:tags from the link add this code after that

            if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null))
            {
                if (tag.Attributes["property"].Value == "og:image")
                {
                    img.ImageUrl = tag.Attributes["content"].Value;
                }

            }

this is great experience... I like :) this code ever

回复收藏 0 原文

终陌 2024-09-15 19:52:36

没有错误检查：

doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;

当然，如果节点为空，或者内容属性不存在，这将产生问题。

With no Error Checking:

doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;

Of course if the Node is Null, or if the Content Attribute is not present this will create an issue.

回复收藏 0 原文

~没有更多了~

关于作者

撞了怀

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使用 XPATH 通过 HTML Agility Pack 获取元标记属性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

使用 XPATH 通过 HTML Agility Pack 获取元标记属性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。