HtmlAgilityPack - 如何获取
大型网页中的数据

发布于 2024-11-18 12:02:44 字数 898 浏览 6 评论 0 原文

我正在尝试从网页中获取数据,

特定类
它有10个类似的< DIV>S 且属于同一类“Personal_info”(如 HTML 代码所示,现在我想提取 10 - 15 中的 Personal_info 类的所有 DIV) href

Rama Anand

手机:9916184586
[电子邮件受保护]
班加罗尔

为了完成所需的工作,我开始使用 HTML AGILE PACK,正如 Stack Overflow 中某人的建议 我一开始就陷入了困境,因为对 HtmlAgilePack 缺乏了解,我的 C# 代码是这样的,

HtmlAgilityPack.HtmlDocument docHtml = new HtmlAgilityPack.HtmlDocument();
        HtmlAgilityPack.HtmlWeb docHFile = new HtmlWeb();

        docHtml = docHFile.Load("http://127.0.0.1/2.html");

然后如何进一步编码,以便可以从类为“personal_info”的 DIV 中获取数据...建议示例将不胜感激

I am trying to grab a data from a WEBPAGE , <DIV>particular class <DIV class="personal_info"> it has 10 similar <DIV>S and is of same Class "Personal_info" ( as shown in HTML Code and now i want to extract all the DIVs of Class personal_info which are in 10 - 15 in every webpage .

<div class="personal_info"><span class="bold">Rama Anand</span><br><br> Mobile: 9916184586<br>[email protected]<br> Bangalore</div>

to do the needful i started using HTML AGILE PACK as suggested by some one in Stack overflow
and i stuck at the beginning it self bcoz of lack of knowledge in HtmlAgilePack my C# code goes like this

HtmlAgilityPack.HtmlDocument docHtml = new HtmlAgilityPack.HtmlDocument();
        HtmlAgilityPack.HtmlWeb docHFile = new HtmlWeb();

        docHtml = docHFile.Load("http://127.0.0.1/2.html");

then how to code further so that data from DIV whose class is "personal_info" can be grabbed ... suggestion with example will be appreciated

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

月牙弯弯 2024-11-25 12:02:44

我现在无法检查这一点,但不是吗:

var infos = from info in docHtml.DocumentNode.SelectNodes("//div[@class='personal_info']") select info; 

I can't check this right now, but isn't it:

var infos = from info in docHtml.DocumentNode.SelectNodes("//div[@class='personal_info']") select info; 
撩心不撩汉 2024-11-25 12:02:44

要加载 url,您可以执行以下操作:

 var document = new HtmlAgilityPack.HtmlDocument(); 
 var url = "http://www.google.com";
 var request = (HttpWebRequest)WebRequest.Create(url);
 using (var responseStream =  request.GetResponse().GetResponseStream())
 {
   document.Load(responseStream, Encoding.UTF8);
 }

另请注意,有一个分支可以让您在敏捷包中使用 jquery 选择器。

IEnumerable<HtmlNode> myList = document.QuerySelectorAll(".personal_info");

http://yosi-havia.blogspot .com/2010/10/using-jquery-selectors-on-server-sidec.html

To get a url loaded you can do something like:

 var document = new HtmlAgilityPack.HtmlDocument(); 
 var url = "http://www.google.com";
 var request = (HttpWebRequest)WebRequest.Create(url);
 using (var responseStream =  request.GetResponse().GetResponseStream())
 {
   document.Load(responseStream, Encoding.UTF8);
 }

Also note there is a fork to let you use jquery selectors in agility pack.

IEnumerable<HtmlNode> myList = document.QuerySelectorAll(".personal_info");

http://yosi-havia.blogspot.com/2010/10/using-jquery-selectors-on-server-sidec.html

椵侞 2024-11-25 12:02:44

哪里发生了什么事?

node.DescendantNodes().Where(node_it => node_it.Name=="div");

如果你想要顶级节点(根),你可以使用 page.DocumentNode 作为“节点”。

What happened to Where?

node.DescendantNodes().Where(node_it => node_it.Name=="div");

if you want top node (root) you use page.DocumentNode as "node".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文