通过 XPath 和 HtmlAgilityPack 获取属性的值

发布于 2024-12-23 10:51:42 字数 559 浏览 0 评论 0原文

我有一个 HTML 文档,我用 XPath 解析它。我想获取元素输入的值,但没有成功。

我的 Html:

<tbody>
  <tr>
    <td>
      <input type="text" name="item" value="10743" readonly="readonly" size="10"/>
    </td>
  </tr>
</tbody>

我的代码:

using HtmlAgilityPack;

HtmlAgilityPack.HtmlDocument doc; 
HtmlWeb hw = new HtmlWeb();
HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//input/@value");
string s=node[0].InnerText;

所以我想获取值:“10743”(而且我不介意获取带有答案的其他标签。)

I have a HTML document and I parse it with XPath. I want to get a value of the element input, but it didn't work.

My Html:

<tbody>
  <tr>
    <td>
      <input type="text" name="item" value="10743" readonly="readonly" size="10"/>
    </td>
  </tr>
</tbody>

My code:

using HtmlAgilityPack;

HtmlAgilityPack.HtmlDocument doc; 
HtmlWeb hw = new HtmlWeb();
HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//input/@value");
string s=node[0].InnerText;

So I want to get the value: "10743" (and I don't mind to get another tags with the answer.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

禾厶谷欠 2024-12-30 10:51:42

你可以在 .Attributes 集合中获取它:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("file.html");
var node = doc.DocumentNode.SelectNodes("//input") [0];
var val = node.Attributes["value"].Value; //10743

you can get it in .Attributes collection:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("file.html");
var node = doc.DocumentNode.SelectNodes("//input") [0];
var val = node.Attributes["value"].Value; //10743
挽袖吟 2024-12-30 10:51:42

如果您使用HtmlNavigator,您也可以直接获取该属性。

//Load document from some html string
HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(htmlContent);

//load navigator for current document
HtmlNavigator navigator = (HtmlNodeNavigator)hdoc.CreateNavigator();

//Get value with given xpath
string xpath = "//input/@value";
string val = navigator.SelectSingleNode(xpath).Value;

You can also directly grab the attribute if you use the HtmlNavigator.

//Load document from some html string
HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(htmlContent);

//load navigator for current document
HtmlNavigator navigator = (HtmlNodeNavigator)hdoc.CreateNavigator();

//Get value with given xpath
string xpath = "//input/@value";
string val = navigator.SelectSingleNode(xpath).Value;
仄言 2024-12-30 10:51:42

更新2:以下是如何使用 Html Agility Pack 获取属性值的代码示例:

http://htmlagilitypack.codeplex.com/wikipage?title=Examples

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link.Attributes["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

您显然需要根据您的需要调整此代码 - 例如您不会修改属性,但只会使用att.Value。


更新:您还可以看看这个问题:

使用 html Agility Pack 选择属性值


您的问题很可能是默认命名空间问题 - 搜索“XPath default namespace c#”,您会发现很多好的解决方案(提示:使用重载SelectNodes()< /strong> 具有 XmlNamespaceManager 参数)。

以下代码显示了“无命名空间”中文档中属性的获取结果:

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNode value = doc.SelectNodes("//input/@value")[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

运行此应用程序的结果是

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel

现在,对于以下文档:位于默认命名空间中

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input xmlns='some:Namespace' value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
        nsmgr.AddNamespace("x", "some:Namespace");

        XmlNode value = doc.SelectNodes("//x:input/@value", nsmgr)[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

运行此应用程序会再次产生所需的结果

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel

Update2: Here is a code example how to get values of attributes using Html Agility Pack:

http://htmlagilitypack.codeplex.com/wikipage?title=Examples

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link.Attributes["href"];
    att.Value = FixLink(att);
 }
 doc.Save("file.htm");

You obviously need to adapt this code to your needs -- for example you will not modify the attributes, but will just use att.Value .


Update: You may also look at this question:

Selecting attribute values with html Agility Pack


Your problem is most likely a default namespace problem -- search for "XPath default namespace c#" and you will find many good solutions (hint: use the overload of SelectNodes() that has an XmlNamespaceManager argument).

The following code shows what one gets for an attribute in a document in "no namespace":

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNode value = doc.SelectNodes("//input/@value")[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

The result from running this app is:

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel

Now, for a document that is in a default namespace:

using System;
using System.IO;
using System.Xml;

public class Sample
{

    public static void Main()
    {

        XmlDocument doc = new XmlDocument();
        doc.LoadXml("<input xmlns='some:Namespace' value='novel' ISBN='1-861001-57-5'>" +
                    "<title>Pride And Prejudice</title>" +
                    "</input>");

        XmlNode root = doc.DocumentElement;

        XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
        nsmgr.AddNamespace("x", "some:Namespace");

        XmlNode value = doc.SelectNodes("//x:input/@value", nsmgr)[0];

        Console.WriteLine("Inner text: " + value.InnerText);
        Console.WriteLine("InnerXml: " + value.InnerXml);
        Console.WriteLine("OuterXml: " + value.OuterXml);
        Console.WriteLine("Value: " + value.Value);

    }
}

Running this app produces again the wanted results:

Inner text: novel
InnerXml: novel
OuterXml: value="novel"
Value: novel
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文