使用 HtmlAgilityPack 编写查询来解析 HTML DOCUMENT
我想获取 span class="floatClear" 中该元素的 A href,其评级在
中最低 span class="star-imgstars_4"
我如何使用 HtmlAgilityPack 来实现此行为我已经给出了我的文件的 html 源
<div class="businessresult"> //will repeat
<div class="rightcol">
<div class="rating">
<span class="star-img stars_4">
<img height="325" width="84" src="http://media1.px" alt="4.0 star rating" **title**="4.0 star rating">
</span>
</div>
</div>
<span class="floatClear">
<a class="ybtn btn-y-s" href="/writeareview/biz/KaBw8UEm8u6war_loc%NY">
</span>
</div>
我编写的查询
var lowestreview =
from main in htmlDoc.DocumentNode.SelectNodes("//div[@class='rightcol']")
from rating in htmlDoc.DocumentNode.SelectNodes("//div[@class='rating']")
from ratingspan in htmlDoc.DocumentNode.SelectNodes("//span[@class='star-img stars_4']")
from floatClear in htmlDoc.DocumentNode.SelectNodes("//span[@class='floatClear']")
select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };
但我不知道如何在此处应用条件LINQ 查询的最后一行!
I want to get the A href of that element in span class="floatClear" whose rating is minimum in
span class="star-img stars_4"
How can I use HtmlAgilityPack to achieve this behaviour I have give the html source of my file
<div class="businessresult"> //will repeat
<div class="rightcol">
<div class="rating">
<span class="star-img stars_4">
<img height="325" width="84" src="http://media1.px" alt="4.0 star rating" **title**="4.0 star rating">
</span>
</div>
</div>
<span class="floatClear">
<a class="ybtn btn-y-s" href="/writeareview/biz/KaBw8UEm8u6war_loc%NY">
</span>
</div>
The query I have written
var lowestreview =
from main in htmlDoc.DocumentNode.SelectNodes("//div[@class='rightcol']")
from rating in htmlDoc.DocumentNode.SelectNodes("//div[@class='rating']")
from ratingspan in htmlDoc.DocumentNode.SelectNodes("//span[@class='star-img stars_4']")
from floatClear in htmlDoc.DocumentNode.SelectNodes("//span[@class='floatClear']")
select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };
But I do not know how to apply condition here at last line of LINQ query!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要从整个 htmlDoc 中选择“ rating”,而是从之前找到的“main”中选择它。
我猜你需要类似的东西:
我希望如果其中一些 div 和跨度不存在,它不会崩溃:当
SelectNodes
不存在时,HtmlAgilityPack 的先前版本返回 null 而不是空列表找到任何东西。编辑
您可能还需要更改内部选择的“xpath 查询”:将“//”更改为“.//”(开头额外的 .)以表明您确实想要一个子节点。如果 AgilityPack 的工作方式与常规 XML-XPath 相同(我不是 100% 确定),那么开头的“//”将从文档的根开始搜索,即使您从子节点指定它也是如此。 “.//”将始终从您正在搜索的节点开始搜索。
main.SelectNodes("//div[@class=' rating']")
也可能会找到在上一行中找到的
之外。
main.SelectNodes(".//div[@class=' rating']")
应该可以解决这个问题。Don't select "rating" from the entire htmlDoc, select it from the previously found "main".
I guess you need something like:
I hope it will not crash if some of those divs ans spans are not present: a previous version of the HtmlAgilityPack returned null instead of an empty list when the
SelectNodes
didn't find anything.EDIT
You probably also need to change the "xpath query" for the inner selects: change the "//" into ".//" (extra . at the beginning) to signal that you really want a subnode. If the AgilityPack works the same as regular XML-XPath (I'm not 100% sure) then a "//" at the beginning will search from the root of the document, even if you specify it from a subnode. A ".//" will always search from the node you are searching from.
A
main.SelectNodes("//div[@class='rating']")
will (probably) also find<div class="rating">
s outside the<div class="rightcol">
you found in the previous line.A
main.SelectNodes(".//div[@class='rating']")
should fix that.