使用 HtmlAgilityPack 编写查询来解析 HTML DOCUMENT

发布于 2024-11-14 12:55:10 字数 1258 浏览 3 评论 0原文

我想获取 span class="floatClear" 中该元素的 A href,其评级在
中最低 span class="star-imgstars_4"

我如何使用 HtmlAgilityPack 来实现此行为我已经给出了我的文件的 html 源

<div class="businessresult">  //will repeat


      <div class="rightcol">

       <div class="rating">

        <span class="star-img stars_4">
          <img height="325" width="84" src="http://media1.px" alt="4.0 star rating"   **title**="4.0 star rating">
         </span>

        </div>
      </div>

        <span class="floatClear">
             <a class="ybtn btn-y-s" href="/writeareview/biz/KaBw8UEm8u6war_loc%NY">
        </span>
</div>

我编写的查询

var lowestreview = 
      from main in htmlDoc.DocumentNode.SelectNodes("//div[@class='rightcol']") 
       from rating in htmlDoc.DocumentNode.SelectNodes("//div[@class='rating']")
         from ratingspan in htmlDoc.DocumentNode.SelectNodes("//span[@class='star-img stars_4']")
          from floatClear in htmlDoc.DocumentNode.SelectNodes("//span[@class='floatClear']")
       select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };

但我不知道如何在此处应用条件LINQ 查询的最后一行!

I want to get the A href of that element in span class="floatClear" whose rating is minimum in
span class="star-img stars_4"

How can I use HtmlAgilityPack to achieve this behaviour I have give the html source of my file

<div class="businessresult">  //will repeat


      <div class="rightcol">

       <div class="rating">

        <span class="star-img stars_4">
          <img height="325" width="84" src="http://media1.px" alt="4.0 star rating"   **title**="4.0 star rating">
         </span>

        </div>
      </div>

        <span class="floatClear">
             <a class="ybtn btn-y-s" href="/writeareview/biz/KaBw8UEm8u6war_loc%NY">
        </span>
</div>

The query I have written

var lowestreview = 
      from main in htmlDoc.DocumentNode.SelectNodes("//div[@class='rightcol']") 
       from rating in htmlDoc.DocumentNode.SelectNodes("//div[@class='rating']")
         from ratingspan in htmlDoc.DocumentNode.SelectNodes("//span[@class='star-img stars_4']")
          from floatClear in htmlDoc.DocumentNode.SelectNodes("//span[@class='floatClear']")
       select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };

But I do not know how to apply condition here at last line of LINQ query!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

メ斷腸人バ 2024-11-21 12:55:10

不要从整个 htmlDoc 中选择“ rating”,而是从之前找到的“main”中选择它。

我猜你需要类似的东西:

var lowestreview = 
  from main in htmlDoc.DocumentNode.SelectNodes("//div[@class='rightcol']") 
   from rating in main.SelectNodes("//div[@class='rating']")
     from ratingspan in rating.SelectNodes("//span[@class='star-img stars_4']")
      from floatClear in ratingspan.SelectNodes("//span[@class='floatClear']")
   select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };

我希望如果其中一些 div 和跨度不存在,它不会崩溃:当 SelectNodes 不存在时,HtmlAgilityPack 的先前版本返回 null 而不是空列表找到任何东西。

编辑
您可能还需要更改内部选择的“xpath 查询”:将“//”更改为“.//”(开头额外的 .)以表明您确实想要一个子节点。如果 AgilityPack 的工作方式与常规 XML-XPath 相同(我不是 100% 确定),那么开头的“//”将从文档的根开始搜索,即使您从子节点指定它也是如此。 “.//”将始终从您正在搜索的节点开始搜索。

main.SelectNodes("//div[@class=' rating']") 也可能会找到

在上一行中找到的

之外。
main.SelectNodes(".//div[@class=' rating']") 应该可以解决这个问题。

Don't select "rating" from the entire htmlDoc, select it from the previously found "main".

I guess you need something like:

var lowestreview = 
  from main in htmlDoc.DocumentNode.SelectNodes("//div[@class='rightcol']") 
   from rating in main.SelectNodes("//div[@class='rating']")
     from ratingspan in rating.SelectNodes("//span[@class='star-img stars_4']")
      from floatClear in ratingspan.SelectNodes("//span[@class='floatClear']")
   select new { Rate = ratingspan.InnerText, AHref = floatClear.InnerHtml };

I hope it will not crash if some of those divs ans spans are not present: a previous version of the HtmlAgilityPack returned null instead of an empty list when the SelectNodes didn't find anything.

EDIT
You probably also need to change the "xpath query" for the inner selects: change the "//" into ".//" (extra . at the beginning) to signal that you really want a subnode. If the AgilityPack works the same as regular XML-XPath (I'm not 100% sure) then a "//" at the beginning will search from the root of the document, even if you specify it from a subnode. A ".//" will always search from the node you are searching from.

A main.SelectNodes("//div[@class='rating']") will (probably) also find <div class="rating">s outside the <div class="rightcol"> you found in the previous line.
A main.SelectNodes(".//div[@class='rating']") should fix that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文