使用 XPATH 通过 HTML Agility Pack 获取元标记属性

发布于 2024-09-08 19:52:36 字数 735 浏览 8 评论 0原文

META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
TITLE>Microsoft Corporation
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" />
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" />
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." />
META NAME="MS.LOCALE" CONTENT="EN-US" />
META NAME="CATEGORY" CONTENT="home page" />

我想知道使用 HTML Agility Pack 获取类别元标记的 Content 属性的值需要什么 XPATH。 (我删除了 html 代码中每行的第一个 <,以便它可以发布)。

META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" />
TITLE>Microsoft Corporation
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" />
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" />
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." />
META NAME="MS.LOCALE" CONTENT="EN-US" />
META NAME="CATEGORY" CONTENT="home page" />

I'd like to know what XPATH I would need to get the value of the Content attribute of the Category meta tag using HTML Agility Pack. (I removed the first < of each line in the html code so it would post).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

吲‖鸣 2024-09-15 19:52:36

很长一段时间HtmlAgilityPack 不具备直接查询某个属性值的能力< /a>.您必须循环元节点列表。这是一种方法:

var doc = new HtmlDocument();
doc.LoadHtml(htmlString);

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list)
{
    string content = node.GetAttributeValue("content", "");
}

但看起来有一个 实验性 xpath 版本 可以让你这样做。

doc.DocumentNode.SelectNodes("//meta/@content") 

将返回 HtmlAttribute 对象的列表。

For a long time HtmlAgilityPack didn't have the ability to directly query an attribute value. You had to loop over the list of meta nodes. Here's one way:

var doc = new HtmlDocument();
doc.LoadHtml(htmlString);

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list)
{
    string content = node.GetAttributeValue("content", "");
}

But it looks like there is an experimental xpath release that will let you do that.

doc.DocumentNode.SelectNodes("//meta/@content") 

will return a list of HtmlAttribute objects.

八巷 2024-09-15 19:52:36

感谢 Rohit Agarwal 的快速回复(我在问了几个小时后就看到它得到了答复,但直到今天才能够测试)。

我最初确实按照如下方式实现了您的建议(位于 vb.net 中)


Dim 结果 As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(result)

    Dim list = doc.DocumentNode.SelectNodes("//meta")
    Dim node As Object

    For Each node In list
        Dim metaname As String = node.GetAttributeValue("name", String.Empty)
        If metaname <> String.Empty Then
            If (metaname = "title") Then
                title = node.GetAttributeValue("content", String.Empty)
            //more elseif thens
            End if
        End if
    Next (node)

但是,我发现 //meta[@name='title'] 会给我相同的结果


Dim result As String = webClient.DownloadString(url)

Dim doc As New HtmlDocument()
doc.LoadHtml(结果)

title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)

感谢您让我走上正确的道路=D

Thank you for the quick response Rohit Agarwal (I saw it answered only a few hours after I asked, but haven't been able to test it until today).

I did originally implement your suggestion as follows (it's in vb.net)


Dim result As String = webClient.DownloadString(url)
Dim doc As New HtmlDocument()
doc.LoadHtml(result)

    Dim list = doc.DocumentNode.SelectNodes("//meta")
    Dim node As Object

    For Each node In list
        Dim metaname As String = node.GetAttributeValue("name", String.Empty)
        If metaname <> String.Empty Then
            If (metaname = "title") Then
                title = node.GetAttributeValue("content", String.Empty)
            //more elseif thens
            End if
        End if
    Next (node)

However, I've found that //meta[@name='title'] will give me the same result


Dim result As String = webClient.DownloadString(url)

Dim doc As New HtmlDocument()
doc.LoadHtml(result)

title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)

Thanks for putting me on the right track=D

终遇你 2024-09-15 19:52:36

如果您只想让元标记显示标题、描述和关键字,请使用

 if (metaTags != null)
        {
            foreach (var tag in metaTags)
            {
                if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
                {
                       Panel divPage = new Panel();                        
                       divPage.InnerHtml = divPage.InnerHtml + "<br /> " +
                        "<b> Page " + tag.Attributes["name"].Value + " </b>: " +
                            tag.Attributes["content"].Value + "<br />";
                }
            }
        }

如果您想从链接获取 og:tags,然后添加此代码,

            if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null))
            {
                if (tag.Attributes["property"].Value == "og:image")
                {
                    img.ImageUrl = tag.Attributes["content"].Value;
                }

            }

这是很棒的体验...我喜欢: )这段代码曾经

If you just want the meta tag to display Title, description and keywords then use

 if (metaTags != null)
        {
            foreach (var tag in metaTags)
            {
                if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null))
                {
                       Panel divPage = new Panel();                        
                       divPage.InnerHtml = divPage.InnerHtml + "<br /> " +
                        "<b> Page " + tag.Attributes["name"].Value + " </b>: " +
                            tag.Attributes["content"].Value + "<br />";
                }
            }
        }

If you want to get the og:tags from the link add this code after that

            if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null))
            {
                if (tag.Attributes["property"].Value == "og:image")
                {
                    img.ImageUrl = tag.Attributes["content"].Value;
                }

            }

this is great experience... I like :) this code ever

终陌 2024-09-15 19:52:36

没有错误检查:

doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;

当然,如果节点为空,或者内容属性不存在,这将产生问题。

With no Error Checking:

doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;

Of course if the Node is Null, or if the Content Attribute is not present this will create an issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文