无法弄清楚如何使用 HTML Agility Pack 进行解析

发布于 2024-12-09 16:07:21 字数 1891 浏览 1 评论 0原文

我有以下 HTML 代码块,但我不知道如何获取指定值

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body >
<form name="form1" method="post" action="" id="form1">
    <div>
    <table class="tableclass" >
       <tbody>
        <tr>        
        <tr>
            <td colspan="5" class="myclass1"><span id="myclass2">value1</span></td>
        </tr>

        <tr id="idvalue" aa="1" class="myclass3a">
            <td><a href="" target="_blank">value2</a></td>
            <td>value3</td>
            <td>value4</td>
            <td>value5</td>
            <td>value6</td>
        </tr>

        <tr id="idvalue" aa="2" class="myclass3b">
            <td><a href="" target="_blank">value2</a></td>
            <td>value3</td>
            <td>value4</td>
            <td>value5</td>
            <td>value6</td>
        </tr>

        <tr id="idvalue" aa="3" class="myclass3c">
            <td><a href="" target="_blank">value2</a></td>
            <td>value3</td>
            <td>value4</td>
            <td>value5</td>
            <td>value6</td>
        </tr>
        </tbody>
        </table>

    </div>

</form>
</body>
</html>

让我分析一下这段代码。

该页面有一个表格,第一行的格式略有不同,我想从每一行中提取 value1 和其余行,每个行都有各种类和不同的 id 值直到表格末尾,我想提取 value2value3value4value5

感谢您的宝贵时间

I have the following chunk of HTML code but i cant figure how i can get the designated values

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body >
<form name="form1" method="post" action="" id="form1">
    <div>
    <table class="tableclass" >
       <tbody>
        <tr>        
        <tr>
            <td colspan="5" class="myclass1"><span id="myclass2">value1</span></td>
        </tr>

        <tr id="idvalue" aa="1" class="myclass3a">
            <td><a href="" target="_blank">value2</a></td>
            <td>value3</td>
            <td>value4</td>
            <td>value5</td>
            <td>value6</td>
        </tr>

        <tr id="idvalue" aa="2" class="myclass3b">
            <td><a href="" target="_blank">value2</a></td>
            <td>value3</td>
            <td>value4</td>
            <td>value5</td>
            <td>value6</td>
        </tr>

        <tr id="idvalue" aa="3" class="myclass3c">
            <td><a href="" target="_blank">value2</a></td>
            <td>value3</td>
            <td>value4</td>
            <td>value5</td>
            <td>value6</td>
        </tr>
        </tbody>
        </table>

    </div>

</form>
</body>
</html>

Let me analyze a little this code.

The page has a table with the 1st row having a slightly different format where i want to extract the value1 and the rest of the rows that each have an a variety of classes and different id values and from each row until the end of the table i want to extract value2, value3, value4, value5

Thanks for your time

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

絕版丫頭 2024-12-16 16:07:21
var doc = new HtmlDocument();
doc.Load(url);

var table = doc.DocumentNode.SelectSingleNode("//table[@class='tableclass']");
var value1 = table.Descendants("tr").Skip(1)
    .Select(tr => tr.InnerText.Trim())
    .First();
var theRest =
    from tr in table.Descendants("tr").Skip(2)
    let values = tr.Elements("td")
        .Select(td => td.InnerText.Trim())
        .ToList()
    select new
    {
        Value2 = values[0],
        Value3 = values[1],
        Value4 = values[2],
        Value5 = values[3],
        Value6 = values[4],
    };
var doc = new HtmlDocument();
doc.Load(url);

var table = doc.DocumentNode.SelectSingleNode("//table[@class='tableclass']");
var value1 = table.Descendants("tr").Skip(1)
    .Select(tr => tr.InnerText.Trim())
    .First();
var theRest =
    from tr in table.Descendants("tr").Skip(2)
    let values = tr.Elements("td")
        .Select(td => td.InnerText.Trim())
        .ToList()
    select new
    {
        Value2 = values[0],
        Value3 = values[1],
        Value4 = values[2],
        Value5 = values[3],
        Value6 = values[4],
    };
£噩梦荏苒 2024-12-16 16:07:21

以下...

var document = new HtmlDocument();
...

var nodes = document.DocumentNode.Descendants("td");
foreach(var node in nodes)
{
    Console.WriteLine(node.InnerText);
}

产生...

value1
value2
value3
value4
value5
value6
value2
value3
value4
value5
value6
value2
value3
value4
value5
value6

希望这就是您所追求的。

The following...

var document = new HtmlDocument();
...

var nodes = document.DocumentNode.Descendants("td");
foreach(var node in nodes)
{
    Console.WriteLine(node.InnerText);
}

Produces...

value1
value2
value3
value4
value5
value6
value2
value3
value4
value5
value6
value2
value3
value4
value5
value6

Hopefully that is what you are after.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文