使用 HTTP Agility Pack 解析 HTML
这是 5 个表中的一个:
<h3>marec - maj 2009</h3>
<div class="graf_table">
<table summary="layout table">
<tr>
<th>DATUM</th>
<td class="datum">10.03.2009</td>
<td class="datum">24.03.2009</td>
<td class="datum">07.04.2009</td>
<td class="datum">21.04.2009</td>
<td class="datum">05.05.2009</td>
<td class="datum">06.05.2009</td>
</tr>
<tr>
<th>Maloprodajna cena [EUR/L]</th>
<td>0,96000</td>
<td>0,97000</td>
<td>0,99600</td>
<td>1,00800</td>
<td>1,00800</td>
<td>1,01000</td>
</tr>
<tr>
<th>Maloprodajna cena [SIT/L]</th>
<td>230,054</td>
<td>232,451</td>
<td>238,681</td>
<td>241,557</td>
<td>241,557</td>
<td>242,036</td>
</tr>
<tr>
<th>Prodajna cena brez dajatev</th>
<td>0,33795</td>
<td>0,34628</td>
<td>0,36795</td>
<td>0,37795</td>
<td>0,37795</td>
<td>0,37962</td>
</tr>
<tr>
<th>Trošarina</th>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
</tr>
<tr>
<th>DDV</th>
<td>0,16000</td>
<td>0,16167</td>
<td>0,16600</td>
<td>0,16800</td>
<td>0,16800</td>
<td>0,16833</td>
</tr>
</table>
</div>
我必须提取值,其中表头是 DATUM 和 Maloprodajna cena [EUR/L]。 我正在使用 Agility HTML 包。
this.htmlDoc = new HtmlAgilityPack.HtmlDocument();
this.htmlDoc.OptionCheckSyntax = true;
this.htmlDoc.OptionFixNestedTags = true;
this.htmlDoc.OptionAutoCloseOnEnd = true;
this.htmlDoc.OptionOutputAsXml = true; // is this necessary ??
this.htmlDoc.OptionDefaultStreamEncoding = System.Text.Encoding.Default;
我在获取这些价值观时遇到了很多麻烦。 我开始于:
var query = from html in doc.DocumentNode.SelectNodes("//div[@class='graf_table']").Cast<HtmlNode>()
from table in html.SelectNodes("//table").Cast<HtmlNode>()
from row in table.SelectNodes("tr").Cast<HtmlNode>()
from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
select new { Table = table.Id, CellText = cell.InnerHtml };
但无法找到一种方法来仅选择表头为 DATUM 和 Maloprodajna cena[EUR/L] 的值。是否可以使用 where 子句来做到这一点?
然后我以这两个查询结束:
var date = (from d in htmlDoc.DocumentNode.SelectNodes("//div[@class='graf_table']//table//tr[1]/td")
select DateTime.Parse(d.InnerText)).ToArray();
var price = (from p in htmlDoc.DocumentNode.SelectNodes("//div[@class='graf_table']//table//tr[2]/td")
select double.Parse(p.InnerText)).ToArray();
是否可以合并这两个查询? 我如何将其转换为 lambda 表达式? 我刚刚开始学习这些东西,我想知道它是如何完成的,以便将来我不会有这些问题。
哦,还有一个问题......有人知道任何图形控件吗,因为我必须在图形中显示这些值。 我开始使用 Microsoft 图表控件,但在设置它时遇到问题。 因此,如果有人对此有任何经验,我想知道如何设置它,以便 x 轴将显示所有值而不是每秒...示例: 如果我有: 2009年3月10日、2009年3月24日、2009年4月7日、2009年4月21日、2009年5月5日、2009年5月6日 它仅显示:2009年3月10日、2009年4月7日、2009年5月5日等。
我将数据绑定到这样的图表:
chart1.Series["Series1"].Points.DataBindXY(date, price);
我的第一篇文章有很多问题......呵呵,希望我没有含糊不清或其他什么。 感谢您的任何回复!
Here is one table out of 5:
<h3>marec - maj 2009</h3>
<div class="graf_table">
<table summary="layout table">
<tr>
<th>DATUM</th>
<td class="datum">10.03.2009</td>
<td class="datum">24.03.2009</td>
<td class="datum">07.04.2009</td>
<td class="datum">21.04.2009</td>
<td class="datum">05.05.2009</td>
<td class="datum">06.05.2009</td>
</tr>
<tr>
<th>Maloprodajna cena [EUR/L]</th>
<td>0,96000</td>
<td>0,97000</td>
<td>0,99600</td>
<td>1,00800</td>
<td>1,00800</td>
<td>1,01000</td>
</tr>
<tr>
<th>Maloprodajna cena [SIT/L]</th>
<td>230,054</td>
<td>232,451</td>
<td>238,681</td>
<td>241,557</td>
<td>241,557</td>
<td>242,036</td>
</tr>
<tr>
<th>Prodajna cena brez dajatev</th>
<td>0,33795</td>
<td>0,34628</td>
<td>0,36795</td>
<td>0,37795</td>
<td>0,37795</td>
<td>0,37962</td>
</tr>
<tr>
<th>Trošarina</th>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
<td>0,46205</td>
</tr>
<tr>
<th>DDV</th>
<td>0,16000</td>
<td>0,16167</td>
<td>0,16600</td>
<td>0,16800</td>
<td>0,16800</td>
<td>0,16833</td>
</tr>
</table>
</div>
I have to extract out values, where table header is DATUM and Maloprodajna cena [EUR/L].
I am using Agility HTML pack.
this.htmlDoc = new HtmlAgilityPack.HtmlDocument();
this.htmlDoc.OptionCheckSyntax = true;
this.htmlDoc.OptionFixNestedTags = true;
this.htmlDoc.OptionAutoCloseOnEnd = true;
this.htmlDoc.OptionOutputAsXml = true; // is this necessary ??
this.htmlDoc.OptionDefaultStreamEncoding = System.Text.Encoding.Default;
I had a lot of trouble with getting those values out.
I started with:
var query = from html in doc.DocumentNode.SelectNodes("//div[@class='graf_table']").Cast<HtmlNode>()
from table in html.SelectNodes("//table").Cast<HtmlNode>()
from row in table.SelectNodes("tr").Cast<HtmlNode>()
from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
select new { Table = table.Id, CellText = cell.InnerHtml };
but could not figure out a way to select only values where table header is DATUM and Maloprodajna cena[EUR/L]. Is it possible to do that with where clause?
Then I ended with those two queries:
var date = (from d in htmlDoc.DocumentNode.SelectNodes("//div[@class='graf_table']//table//tr[1]/td")
select DateTime.Parse(d.InnerText)).ToArray();
var price = (from p in htmlDoc.DocumentNode.SelectNodes("//div[@class='graf_table']//table//tr[2]/td")
select double.Parse(p.InnerText)).ToArray();
Is it possible to combine those two queries?
And how would I convert that to lambda expression?
I just started to learn those things and I would like to know how it is done so that in the future I would not have those question.
O, one more question ... does anybody know any graph control, cause I have to show those values in graph.
I started with Microsoft Chart Controls, but I am having trouble with setting it.
So if anyone has any experience with it I would like to know how to set it, so that x axle will show all values not every second ... example:
if I have: 10.03.2009, 24.03.2009, 07.04.2009, 21.04.2009, 05.05.2009, 06.05.2009
it show only: 10.03.2009, 07.04.2009, 05.05.2009, ect.
I bind data to graph like that:
chart1.Series["Series1"].Points.DataBindXY(date, price);
I lot of questions for my fist post ... hehe, hope that I was not indistinct or something.
Thank's for any reply!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于此类 CodePlex 项目,请考虑将您的问题直接发布到他们的讨论区。通常这是联系开发人员的最佳方式。
For such CodePlex projects, please consider posting your questions directly to their Discussion boards. Usually that's the best way to contact the developers.