使用 HTML Agility Pack 关联相邻元素值

发布于 2024-11-26 17:35:30 字数 5616 浏览 2 评论 0原文

我试图获取 HTML 注释后面带有文本“Results”的 h2 元素,后跟带有类名“stockfeed”的 table 元素。

我已经弄清楚如何提取我需要的数据(见下文),但我不确定如何同时将两个元素提取到一起。我知道我可以使用相同的索引器来迭代集合来关联值,但这似乎容易出错,因为我的 h2 元素之一可能没有相邻的表元素(罕见但可能)。

示例 HTML 标记:

<h1>
    Results Page</h1>
<h2>
    Updated Daily @ 10:00 AM</h2>
<div class='someClass1'>
    <!-- Results -->
    <div class='something'>
    </div>
    <h2 style='display: inline;'>
        <a href='http://www.somesite.com'>Table 1</a>
    </h2>
    <div class='clr'>
    </div>
    <div class='resultBlock'>
        <table class='stockfeed'>
            <thead>
                <tr>
                    <th>
                        Part
                    </th>
                    <th>
                        Description
                    </th>
                    <th>
                        Stock
                    </th>
                    <th>
                        Price
                    </th>
                </tr>
            </thead>
            <tbody>
                <tr class='row1' valign='top'>
                    <td>
                        A 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        B 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        C 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
    <!-- Results -->
    <div class='something'>
    </div>
    <h2 style='display: inline;'>
        <a href='http://www.somesite.com'>Table 2</a>
    </h2>
    <div class='clr'>
    </div>
    <div class='resultBlock'>
        <table class='stockfeed'>
            <thead>
                <tr>
                    <th>
                        Part
                    </th>
                    <th>
                        Description
                    </th>
                    <th>
                        Stock
                    </th>
                    <th>
                        Price
                    </th>
                </tr>
            </thead>
            <tbody>
                <tr class='row1' valign='top'>
                    <td>
                        A 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        B 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        C 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
</div>

单独解析值的当前代码:

    HtmlNodeCollection titles = doc.DocumentNode.SelectNodes("//comment()[contains(.,'Results')]/following-sibling::h2");
    for (int tit = 0; tit < titles.Count; ++tit)
    {
        // Do Something
    }

    HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table[@class='stockfeed']");
    for (int tab = 0; tab < tables.Count; ++tab)
    {
        // Do Something
    }

I'm trying to grab the h2 element that follows the HTML comment with the text "Results", followed by the table element with the class name "stockfeed".

I've figured out how to pull the data I need (see below), but I not sure how to pull the 2 elements together at the same time. I know I can iterate the collections using the same indexer to correlate the values, but this seems error prone since it may be possible for one of my h2 elements to not have a adjacent table element (rare but possible).

Example HTML markup:

<h1>
    Results Page</h1>
<h2>
    Updated Daily @ 10:00 AM</h2>
<div class='someClass1'>
    <!-- Results -->
    <div class='something'>
    </div>
    <h2 style='display: inline;'>
        <a href='http://www.somesite.com'>Table 1</a>
    </h2>
    <div class='clr'>
    </div>
    <div class='resultBlock'>
        <table class='stockfeed'>
            <thead>
                <tr>
                    <th>
                        Part
                    </th>
                    <th>
                        Description
                    </th>
                    <th>
                        Stock
                    </th>
                    <th>
                        Price
                    </th>
                </tr>
            </thead>
            <tbody>
                <tr class='row1' valign='top'>
                    <td>
                        A 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        B 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        C 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
    <!-- Results -->
    <div class='something'>
    </div>
    <h2 style='display: inline;'>
        <a href='http://www.somesite.com'>Table 2</a>
    </h2>
    <div class='clr'>
    </div>
    <div class='resultBlock'>
        <table class='stockfeed'>
            <thead>
                <tr>
                    <th>
                        Part
                    </th>
                    <th>
                        Description
                    </th>
                    <th>
                        Stock
                    </th>
                    <th>
                        Price
                    </th>
                </tr>
            </thead>
            <tbody>
                <tr class='row1' valign='top'>
                    <td>
                        A 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        B 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
                <tr class='row1' valign='top'>
                    <td>
                        C 1234567890
                    </td>
                    <td class='description'>
                        Part Description
                    </td>
                    <td>
                        1,000,000
                    </td>
                    <td>
                        $1.99
                    </td>
                </tr>
            </tbody>
        </table>
    </div>
</div>

Current code to parse the values separately:

    HtmlNodeCollection titles = doc.DocumentNode.SelectNodes("//comment()[contains(.,'Results')]/following-sibling::h2");
    for (int tit = 0; tit < titles.Count; ++tit)
    {
        // Do Something
    }

    HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table[@class='stockfeed']");
    for (int tab = 0; tab < tables.Count; ++tab)
    {
        // Do Something
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

尝蛊 2024-12-03 17:35:30

因此,如果我正确地阅读了此内容,您将尝试获取每个结果的相应表格。

您可以使用与获取以下 h2 元素类似的方法来获取与其相关的以下 table 元素。

var query = doc.DocumentNode
    .SelectNodes("//comment()[contains(.,'Results')]/following-sibling::h2");

foreach (var h2 in query.Cast<HtmlNode>())
{
    var table = h2.SelectSingleNode("following-sibling::*/table[@class='stockfeed']");
    // do stuff with h2 and table
}

So if I'm reading this correctly, you are trying to get the corresponding tables with each result.

You can use a similar approach you used to get the following h2 element to get the following table element relative to it.

var query = doc.DocumentNode
    .SelectNodes("//comment()[contains(.,'Results')]/following-sibling::h2");

foreach (var h2 in query.Cast<HtmlNode>())
{
    var table = h2.SelectSingleNode("following-sibling::*/table[@class='stockfeed']");
    // do stuff with h2 and table
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文