无法找出 HtmlAgilityPack 中的 XPath
我试图绕过制作我的第一个 C# 应用程序(它可以做的不仅仅是说“Hello world”),
现在 html 文件有很多标签,(但只有下面给出的两个 h4 标签。) 但这是我感兴趣的部分:
<table width="100%" height="400" border="0" align="center" cellpadding="0" cellspacing="0" bordercolor="#111111" background="images/page_bg.gif" style="BORDER-COLLAPSE: collapse">
<tbody valign="top">
<tr>
<td>
<table width="80%" border="0" valign=top background="images/page_bg.gif">
<tr>
<td>
<div align="center">
<h4 align="center">
<font face="Verdana, Arial, Helvetica, sans-serif" size="2">
<b>
<font size="4" face="Arial, Helvetica, sans-serif">
UNWANTED TEXT
</font></b></font></h4>
<p><br />
Name : {NAME HERE} <br>Number : {NUMBERS HERE}<br>Number2 : {NUMBERS2}<br><br><h4>UNWANTED TEXT</h4><br>detail NO. : <span class=style7>{NUmbers3}</span><br><br><a href=http://test.xom>UNWANTED TEXT</a><br><br>
</p>
<p class="content"><em><strong>
<p> </p>
我希望获得NAME,Numbers1,Numbers2,Numbers3,所以,我想我必须做这样的事情=
//div[@align = "centre"]/h4/followingsibling::Text();
但它肯定是不完整的,关于我应该如何做的任何想法,我从 firebug 得到了 Xpath : <代码>/html/body/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr[2]/td/div/table/tbody/tr/td/table/ tbody/tr/td/div/h4
我也尝试过这样做(首先获取原始数据,然后进一步修剪它)
HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//table[@height='400']//div[@align='centre']"//p);
foreach(HtmlNode node1 in node)
textBox1.Text += node1.InnerText;
但是这里的节点作为 NULL 传递 非常感谢任何帮助。
I have trying to get around making my first C# application(that can do more than just say "Hello world"),
now the html file got lots of tags,(but got only two h4 tags that are given below.)
but here is the part that i am interested in:
<table width="100%" height="400" border="0" align="center" cellpadding="0" cellspacing="0" bordercolor="#111111" background="images/page_bg.gif" style="BORDER-COLLAPSE: collapse">
<tbody valign="top">
<tr>
<td>
<table width="80%" border="0" valign=top background="images/page_bg.gif">
<tr>
<td>
<div align="center">
<h4 align="center">
<font face="Verdana, Arial, Helvetica, sans-serif" size="2">
<b>
<font size="4" face="Arial, Helvetica, sans-serif">
UNWANTED TEXT
</font></b></font></h4>
<p><br />
Name : {NAME HERE} <br>Number : {NUMBERS HERE}<br>Number2 : {NUMBERS2}<br><br><h4>UNWANTED TEXT</h4><br>detail NO. : <span class=style7>{NUmbers3}</span><br><br><a href=http://test.xom>UNWANTED TEXT</a><br><br>
</p>
<p class="content"><em><strong>
<p> </p>
I wish to get NAME,Numbers1,Numbers2,Numbers3, So, i guess i got to do something like this =
//div[@align = "centre"]/h4/followingsibling::Text();
but surely it is incomplete, any ideas on how should i do it, I got the Xpath from firebug :/html/body/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr[2]/td/div/table/tbody/tr/td/table/tbody/tr/td/div/h4
i have also tried doing(for just getting the raw data first and then trimming it further)
HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//table[@height='400']//div[@align='centre']"//p);
foreach(HtmlNode node1 in node)
textBox1.Text += node1.InnerText;
But the Node here is passed on as NULL
Any help is greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Firefox 将 tbody 标签添加到表中(在原始 html 中该标签可以不存在)。所以,我建议不要写所有路径,找到最有特征的路径并使用//。
例如 //div[@class='data']/table//tr/td
Firefox adds tbody tag to table (in original html this tag can be absent). So, I would suggest do not write all path, find most characterizing path and use //.
For example, //div[@class='data']/table//tr/td
您是否注意到您有
@align="centre"
但 HTML 中有align="center"
(如英式拼写与美式拼写)?Did you notice that you have
@align="centre"
but the HTML hasalign="center"
(as in, British vs US spelling)?