如何使用html敏捷包解析一个简单的页面?

发布于 2024-12-09 06:02:21 字数 877 浏览 1 评论 0原文

我正在尝试解析此页面,但有对于我来说,没有太多独特的信息来唯一地标识我想要的部分。

基本上,我试图将大部分数据直接传输到 Flash 视频中。所以:

Alternating Floor Press

Type: Strength
Main Muscle Worked: Chest 
Other Muscles: Abdominals, Shoulders, Triceps 
Equipment: Kettlebells 
Mechanics Type: Compound
Level: Beginner
Sport: No
Force: N/A

还有显示之前和之后状态的图像链接。

现在我使用这个:

HtmlAgilityPack.HtmlDocument doc = web.Load ( "http://www.bodybuilding.com/exercises/detail/view/name/alternating-floor-press" );
IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants ( "a" );

foreach ( var link in threadLinks )
{
    string str = link.InnerHtml;
    Console.WriteLine ( str );
}

这给了我很多我不需要的东西,但也打印了我需要的东西。我是否应该通过尝试查看我的目标数据可能位于其中的位置来解析此打印数据?

I am trying to parse this page, but there aren't much unique info for me to uniquely identify the sections I want.

Basically I am trying to get the most of the data right to the flash video. So:

Alternating Floor Press

Type: Strength
Main Muscle Worked: Chest 
Other Muscles: Abdominals, Shoulders, Triceps 
Equipment: Kettlebells 
Mechanics Type: Compound
Level: Beginner
Sport: No
Force: N/A

And also the image links that shows before and after states.

Right now I use this:

HtmlAgilityPack.HtmlDocument doc = web.Load ( "http://www.bodybuilding.com/exercises/detail/view/name/alternating-floor-press" );
IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants ( "a" );

foreach ( var link in threadLinks )
{
    string str = link.InnerHtml;
    Console.WriteLine ( str );
}

This gives me a lot of stuff I don't need but also prints what I need. Should I be parsing this printed data by trying to see where my goal data might be inside it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

随心而道 2024-12-16 06:02:21

您可以选择您感兴趣的节点的 id:

        HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.bodybuilding.com/exercises/detail/view/name/alternating-floor-press");
        IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.SelectNodes("//*[@id=\"exerciseDetails\"]");

        foreach (var link in threadLinks)
        {
            string str = link.InnerText;
            Console.WriteLine(str);
        }
        Console.ReadKey();

You can select the id of the nodes you are interested in:

        HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.bodybuilding.com/exercises/detail/view/name/alternating-floor-press");
        IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.SelectNodes("//*[@id=\"exerciseDetails\"]");

        foreach (var link in threadLinks)
        {
            string str = link.InnerText;
            Console.WriteLine(str);
        }
        Console.ReadKey();
狼性发作 2024-12-16 06:02:21

对于给定的 节点,要获取显示的文本,请尝试 .InnerText

现在您正在使用文档中所有 标记的内容。尝试缩小范围,只找到您需要的。查找包含您要查找的特定 标记的其他元素。例如,它们是否都位于具有特定类的

中?

例如,如果您发现您感兴趣的 标签都位于

IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants("div")
    .First(dn => dn.Attributes["class"] == "foolinks").Descendants("a");

- -更新--

鉴于您评论中的信息,我会尝试:-

IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants("div")
    .First(dn => dn.Id == "exerciseDetails").Descendants("a");

--更新--

如果您在使其工作时遇到困难,请尝试将其分解为变量分配并逐步执行代码,检查每个变量以查看是否它包含您所期望的内容。

例如,

var divs = doc.DocumentNode.Descendants("div");
var div = divs.FirstOrDefault(dn => dn.Id == "exerciseDetails");
if (div == null)
{
    // couldn't find the node - do whatever is appropriate, e.g. throw an exception
}

IEnumerable<HtmlNode> threadLinks = div.Descendants("a");

顺便说一句 - 我不确定 .Id 属性是否按照您的建议映射到节点的 id 属性。如果没有,您可以尝试 dn => dn.Attributes["id"] == "exerciseDetails" 代替。

For a given <a> node, to get the text shown, try .InnerText.

Right now you are using the contents of all <a> tags within the document. Try narrowing down to only the ones you need. Look for other elements which contain the particular <a> tags you are after. For example, do they all sit inside a <div> with a certain class?

E.g. if you find the <a> tags you are interested in all sit within <div class="foolinks"> then you can do something like:-

IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants("div")
    .First(dn => dn.Attributes["class"] == "foolinks").Descendants("a");

--UPDATE--

Given the information in your comment, I would try:-

IEnumerable<HtmlNode> threadLinks = doc.DocumentNode.Descendants("div")
    .First(dn => dn.Id == "exerciseDetails").Descendants("a");

--UPDATE--

If you are having trouble getting it to work, try splitting it up into variable assignments and stepping through the code, inspecting each variable to see if it holds what you expect.

E.g,

var divs = doc.DocumentNode.Descendants("div");
var div = divs.FirstOrDefault(dn => dn.Id == "exerciseDetails");
if (div == null)
{
    // couldn't find the node - do whatever is appropriate, e.g. throw an exception
}

IEnumerable<HtmlNode> threadLinks = div.Descendants("a");

BTW - I'm not sure if the .Id property maps to the id attribute of the node as you suggest it does. If not, you could try dn => dn.Attributes["id"] == "exerciseDetails" instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文