如何在 C# 中使用 htmlagilitypack 以单一方法访问更多标签?
我正在设计一个程序来抓取网页 thenextweb.com 的帖子(链接、帖子内容、图像、日期、作者等)。
其一篇帖子的 html 如下:
<div class="media-data">
<h4><a href="http://thenextweb.com/mobile/2012/01/05/nokia-reportedly-to-appoint-f-secure-founder-risto-siilasmaa-as-new-chairman/">Nokia to Name Risto Siilasmaa as New Chairman</a></h4>
<p class="article-meta"><a href="http://thenextweb.com/mobile/">TNW Mobile</a> • <a href="http://thenextweb.com/author/matt/" title="Posts by Matt Brian" rel="author">Matt Brian</a> • <span class="date" title="1325748846">January 5, 2012</span></a></p>
<p>Nokia is reportedly planning to nominate and name Risto Siilasmaa, founder of Finnish anti-virus and computer security F-Secure, as its new chairman by the end of the month, Finland’s Helsingin Sanomat reports…</p>
</div>
这是主页上接下来 15 篇帖子的 html 。 为了访问其内容,我使用了:
var webGet = new HtmlWeb();
var document = webGet.Load(url);
var infos = from info in document.DocumentNode.SelectNodes("//div[@class ='media-data']//h4//a")
select new
{
LinkURL = info.Attributes["href"].Value,
Text = info.InnerText
};
lvLinks.DataSource = infos;
lvLinks.DataBind();
并访问作者、日期等信息,我使用了:
var infos = from info in document.DocumentNode.SelectNodes("//div[@class ='media-data']//p[@rel = 'author']")
select new
{
Author = info.InnerText
};
lvLinks.DataSource = infos;
lvLinks.DataBind();
我使用列表视图控件将 ASP 页面上的数据显示为
但我想要一种方法,以便我可以一次访问所有这些...不需要为 链接、内容 和其他 编写不同的代码>作者,日期等。
是否有一种方法可以让我在
下写入和检索信息>我想要的任何节点的标签并存储它?
请提出这一建议,因为在帖子链接本身中附加作者、日期信息非常重要。我做不到。
谢谢
I am designing a program to crawl the web page thenextweb.com for its posts (links, post content, image, date, author etc.)
The html for its one post is as :
<div class="media-data">
<h4><a href="http://thenextweb.com/mobile/2012/01/05/nokia-reportedly-to-appoint-f-secure-founder-risto-siilasmaa-as-new-chairman/">Nokia to Name Risto Siilasmaa as New Chairman</a></h4>
<p class="article-meta"><a href="http://thenextweb.com/mobile/">TNW Mobile</a> • <a href="http://thenextweb.com/author/matt/" title="Posts by Matt Brian" rel="author">Matt Brian</a> • <span class="date" title="1325748846">January 5, 2012</span></a></p>
<p>Nokia is reportedly planning to nominate and name Risto Siilasmaa, founder of Finnish anti-virus and computer security F-Secure, as its new chairman by the end of the month, Finland’s Helsingin Sanomat reports…</p>
</div>
This is the html for next 15 posts on home page.
For accessing its content i have used :
var webGet = new HtmlWeb();
var document = webGet.Load(url);
var infos = from info in document.DocumentNode.SelectNodes("//div[@class ='media-data']//h4//a")
select new
{
LinkURL = info.Attributes["href"].Value,
Text = info.InnerText
};
lvLinks.DataSource = infos;
lvLinks.DataBind();
and to access the information of Authors, date etc i used :
var infos = from info in document.DocumentNode.SelectNodes("//div[@class ='media-data']//p[@rel = 'author']")
select new
{
Author = info.InnerText
};
lvLinks.DataSource = infos;
lvLinks.DataBind();
I have used list view control to show the data on ASP page as <li> <%# Eval("Text") %> - <%# Eval("LinkUrl") %> </li>
But i want a way so that i can access all of them in one go...No need to write different code for links, content and other for author, date etc.
Can there be a method so that i can write and retrieve information under <div class="media-data">... </div>
tags for any node i want and store it ?
Please suggest this as it is very important to attach the authors, date information with the post link itself. I am not able to do that.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以先选择
类型的节点,然后选择其中所有必需的子节点:
You could select nodes of type
<div class="media-data">
first, and then select all the necessary sub-nodes inside it: