Html Agility Pack 检索数据时出现问题

发布于 2024-10-30 09:35:51 字数 1531 浏览 4 评论 0原文

我正在尝试解析网页 http://www.bbb 中的数据。 org/kitchener/accredited-business-directory?letter=a

我想获得所有类别,例如

会计师 - 注册会计师 (2)

会计服务 (1) 等等,但问题是当我转到节点时,标记 a 为空,我不知道为什么,但 HTMLagility pack 没有获取这些标记。检查手表它说 div 只包含最多注释的断线标签,而不是标签,就像我们在页面源代码中看到的那样,这里

doc.DocumentNode.SelectNodes("//tr/td/table/tr/td/div/div")[0].OuterHtml    "<div style=\"font-size: 12px;line-height: 16px;\"><!--<br />-->\r\n<!--<br />-->\r\n</div>"    

是该 div 的开始 注意,我只包含了 HTML 中的 2 个标签,

<div style="float: left; width: 305px;"> 
  <h5 style="margin: 0px; margin-bottom: 5px; border-bottom: 1px solid #cccccc; padding-bottom: 5px; font-size: 12px;">Categories Starting with letter 'a'</h5> 
   <div style="font-size: 12px;line-height: 16px;">
     <!--<br />-->
     <!--<br />-->       
     <a class="listingName" href="/kitchener/accredited-business-directory/accountants">Accountants (11)</a><br />   
     <a class="listingName" href="/kitchener/accredited-business-directory/accountants-certified-public">Accountants - Certified Public (2)</a><br /> 
   </div> 
</div>

我如何获取数据

即使放置也不会显示链接

foreach (var test in doc.DocumentNode.SelectNodes("//a[@href]")) 
{ MessageBox.Show(test.InnerText+"\n"+test.InnerHtml); }

I am trying to parse data from web page http://www.bbb.org/kitchener/accredited-business-directory?letter=a

i want to get all the categories like

Accountants - Certified Public (2)

Accounting Services (1)
etc but problem is when i goto node then tag a is null i donot know why but HTMLagility pack does not get these tags. Checking in watch it says that div only encloses thest commented breakline tags not the tag where as when we see in page source it is there

doc.DocumentNode.SelectNodes("//tr/td/table/tr/td/div/div")[0].OuterHtml    "<div style=\"font-size: 12px;line-height: 16px;\"><!--<br />-->\r\n<!--<br />-->\r\n</div>"    

here is start of that div
Note i have included only 2 tags from the HTML

<div style="float: left; width: 305px;"> 
  <h5 style="margin: 0px; margin-bottom: 5px; border-bottom: 1px solid #cccccc; padding-bottom: 5px; font-size: 12px;">Categories Starting with letter 'a'</h5> 
   <div style="font-size: 12px;line-height: 16px;">
     <!--<br />-->
     <!--<br />-->       
     <a class="listingName" href="/kitchener/accredited-business-directory/accountants">Accountants (11)</a><br />   
     <a class="listingName" href="/kitchener/accredited-business-directory/accountants-certified-public">Accountants - Certified Public (2)</a><br /> 
   </div> 
</div>

how can i get data

Even putting does not reveal the links

foreach (var test in doc.DocumentNode.SelectNodes("//a[@href]")) 
{ MessageBox.Show(test.InnerText+"\n"+test.InnerHtml); }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不必了 2024-11-06 09:35:51

使用以下示例,这对我来说效果很好:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.bbb.org/kitchener/accredited-business-directory?letter=a");

foreach (var link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
    Console.WriteLine(link.InnerText);
}

输出(缩短):

BBB
Home
Accredited Business Directory
Accountants (11)
Accountants - Certified Public (2)
Accounting Services (1)
Advertising - Direct Mail (3)
Advertising Agencies & Counselors (3)
Advertising Specialties (3)
...

This worked fine for me using the following sample:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.bbb.org/kitchener/accredited-business-directory?letter=a");

foreach (var link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
    Console.WriteLine(link.InnerText);
}

Output (shortened):

BBB
Home
Accredited Business Directory
Accountants (11)
Accountants - Certified Public (2)
Accounting Services (1)
Advertising - Direct Mail (3)
Advertising Agencies & Counselors (3)
Advertising Specialties (3)
...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文