无法找出 HtmlAgilityPack 中的 XPath

发布于 2024-11-10 18:41:33 字数 1843 浏览 6 评论 0原文

我试图绕过制作我的第一个 C# 应用程序（它可以做的不仅仅是说“Hello world”），

现在 html 文件有很多标签，（但只有下面给出的两个 h4 标签。）但这是我感兴趣的部分：

<table width="100%" height="400" border="0" align="center" cellpadding="0" cellspacing="0" bordercolor="#111111" background="images/page_bg.gif" style="BORDER-COLLAPSE: collapse">

<tbody valign="top">
<tr>
<td>

<table width="80%" border="0" valign=top background="images/page_bg.gif">
 <tr>
 <td>

  <div align="center">
   <h4 align="center">
      <font face="Verdana, Arial, Helvetica, sans-serif" size="2">
      <b>
      <font size="4" face="Arial, Helvetica, sans-serif">
      UNWANTED TEXT
       </font></b></font></h4>

  <p><br />
  Name  :  {NAME HERE} <br>Number : {NUMBERS HERE}<br>Number2 : {NUMBERS2}<br><br><h4>UNWANTED TEXT</h4><br>detail NO.  :  <span class=style7>{NUmbers3}</span><br><br><a href=http://test.xom>UNWANTED TEXT</a><br><br>                    
  </p>
  <p class="content"><em><strong>
  <p>&nbsp;</p>

我希望获得NAME，Numbers1，Numbers2，Numbers3，所以，我想我必须做这样的事情=

 //div[@align = "centre"]/h4/followingsibling::Text();

但它肯定是不完整的，关于我应该如何做的任何想法，我从 firebug 得到了 Xpath ： <代码>/html/body/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr[2]/td/div/table/tbody/tr/td/table/ tbody/tr/td/div/h4

我也尝试过这样做（首先获取原始数据，然后进一步修剪它）

 HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//table[@height='400']//div[@align='centre']"//p);
            foreach(HtmlNode node1 in node)    
                textBox1.Text += node1.InnerText;

但是这里的节点作为 NULL 传递非常感谢任何帮助。

原文

I have trying to get around making my first C# application(that can do more than just say "Hello world"),

now the html file got lots of tags,(but got only two h4 tags that are given below.)
but here is the part that i am interested in:

<table width="100%" height="400" border="0" align="center" cellpadding="0" cellspacing="0" bordercolor="#111111" background="images/page_bg.gif" style="BORDER-COLLAPSE: collapse">

<tbody valign="top">
<tr>
<td>

<table width="80%" border="0" valign=top background="images/page_bg.gif">
 <tr>
 <td>

  <div align="center">
   <h4 align="center">
      <font face="Verdana, Arial, Helvetica, sans-serif" size="2">
      <b>
      <font size="4" face="Arial, Helvetica, sans-serif">
      UNWANTED TEXT
       </font></b></font></h4>

  <p><br />
  Name  :  {NAME HERE} <br>Number : {NUMBERS HERE}<br>Number2 : {NUMBERS2}<br><br><h4>UNWANTED TEXT</h4><br>detail NO.  :  <span class=style7>{NUmbers3}</span><br><br><a href=http://test.xom>UNWANTED TEXT</a><br><br>                    
  </p>
  <p class="content"><em><strong>
  <p> </p>

I wish to get NAME,Numbers1,Numbers2,Numbers3, So, i guess i got to do something like this =

 //div[@align = "centre"]/h4/followingsibling::Text();

but surely it is incomplete, any ideas on how should i do it, I got the Xpath from firebug :
/html/body/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr[2]/td/div/table/tbody/tr/td/table/tbody/tr/td/div/h4

i have also tried doing(for just getting the raw data first and then trimming it further)

 HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//table[@height='400']//div[@align='centre']"//p);
            foreach(HtmlNode node1 in node)    
                textBox1.Text += node1.InnerText;

But the Node here is passed on as NULL
Any help is greatly appreciated.

分享到QQ

分享到微博