如何使用html敏捷包解析一个简单的页面?
我正在尝试解析此页面,但有对于我来说,没有太多独特的信息来唯一地标识我想要的部分。
基本上,我试图将大部分数据直接传输到 Flash 视频中。所以:
Alternating Floor Press
Type: Strength
Main Muscle Worked: Chest
Other Muscles: Abdominals, Shoulders, Triceps
Equipment: Kettlebells
Mechanics Type: Compound
Level: Beginner
Sport: No
Force: N/A
还有显示之前和之后状态的图像链接。
现在我使用这个:
HtmlAgilityPack.HtmlDocument doc = web.Load ( "http://www.bodybuilding.com/exercises/detail/view/name/alternating-floor-press" );
IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants ( "a" );
foreach ( var link in threadLinks )
{
string str = link.InnerHtml;
Console.WriteLine ( str );
}
这给了我很多我不需要的东西,但也打印了我需要的东西。我是否应该通过尝试查看我的目标数据可能位于其中的位置来解析此打印数据?
I am trying to parse this page, but there aren't much unique info for me to uniquely identify the sections I want.
Basically I am trying to get the most of the data right to the flash video. So:
Alternating Floor Press
Type: Strength
Main Muscle Worked: Chest
Other Muscles: Abdominals, Shoulders, Triceps
Equipment: Kettlebells
Mechanics Type: Compound
Level: Beginner
Sport: No
Force: N/A
And also the image links that shows before and after states.
Right now I use this:
HtmlAgilityPack.HtmlDocument doc = web.Load ( "http://www.bodybuilding.com/exercises/detail/view/name/alternating-floor-press" );
IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants ( "a" );
foreach ( var link in threadLinks )
{
string str = link.InnerHtml;
Console.WriteLine ( str );
}
This gives me a lot of stuff I don't need but also prints what I need. Should I be parsing this printed data by trying to see where my goal data might be inside it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以选择您感兴趣的节点的 id:
You can select the id of the nodes you are interested in:
对于给定的
节点,要获取显示的文本,请尝试
.InnerText
。现在您正在使用文档中所有
标记的内容。尝试缩小范围,只找到您需要的。查找包含您要查找的特定
标记的其他元素。例如,它们是否都位于具有特定类的
中?
例如,如果您发现您感兴趣的
标签都位于
中,那么您可以执行以下操作:-
- -更新--
鉴于您评论中的信息,我会尝试:-
--更新--
如果您在使其工作时遇到困难,请尝试将其分解为变量分配并逐步执行代码,检查每个变量以查看是否它包含您所期望的内容。
例如,
顺便说一句 - 我不确定
.Id
属性是否按照您的建议映射到节点的 id 属性。如果没有,您可以尝试dn => dn.Attributes["id"] == "exerciseDetails"
代替。For a given
<a>
node, to get the text shown, try.InnerText
.Right now you are using the contents of all
<a>
tags within the document. Try narrowing down to only the ones you need. Look for other elements which contain the particular<a>
tags you are after. For example, do they all sit inside a<div>
with a certain class?E.g. if you find the
<a>
tags you are interested in all sit within<div class="foolinks">
then you can do something like:---UPDATE--
Given the information in your comment, I would try:-
--UPDATE--
If you are having trouble getting it to work, try splitting it up into variable assignments and stepping through the code, inspecting each variable to see if it holds what you expect.
E.g,
BTW - I'm not sure if the
.Id
property maps to the id attribute of the node as you suggest it does. If not, you could trydn => dn.Attributes["id"] == "exerciseDetails"
instead.