HTML 敏捷包

发布于 2024-08-24 21:24:44 字数 241 浏览 7 评论 0原文

我想使用 html 敏捷包解析 html 表。我只想从表中提取一些预定义的列数据。

但我对解析和 html 敏捷包很陌生,我已经尝试过,但我不知道如何使用 html 敏捷包来满足我的需要。

如果有人知道,那么如果可能的话请给我一个例子

编辑:

如果我们只想提取决定的列名称的数据,是否可以解析html表?就像有 4 列名称、地址、电话号码一样,我只想提取名称和地址数据。

I want to parse the html table using html agility pack. I want to extract only some predefined column data from the table.

But I am new to parsing and html agility pack and I have tried but I don't know how to use the html agility pack for my need.

If anybody knows then give me example if possible

EDIT :

Is it possible to parse html table like if we want to extract the decided column names' data only ? Like there are 4 columns name,address,phno and I want to extract only name and address data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

—━☆沉默づ 2024-08-31 21:24:44

此处的讨论论坛中有一个示例。向下滚动一点即可查看表格答案。我确实希望他们能提供更好、更容易找到的样品。

编辑:
要从特定列中提取数据,您必须首先找到与所需列相对应的 标签并记住它们的索引。然后,您需要查找相同索引的 标签。假设您知道列的索引,您可以执行以下操作:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://somewhere.com");
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
foreach (var row in table.SelectNodes("//tr"))
{
    HtmlNode addressNode = row.SelectSingleNode("td[2]");
    //do something with address here
    HtmlNode phoneNode = row.SelectSingleNode("td[5]");
    // do something with phone here
}

Edit2:
如果您不知道列的索引,您可以像这样完成整个操作。我没有测试过这个。

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://somewhere.com");
var tables = doc.DocumentNode.SelectNodes("//table");

foreach(var table in tables)
{
    int addressIndex = -1;
    int phoneIndex = -1;
    var headers = table.SelectNodes("//th");
    for (int headerIndex = 0; headerIndex < headers.Count(); headerIndex++)
    {
        if (headers[headerIndex].InnerText == "address")
        {
            addressIndex = headerIndex;
        }
        else if (headers[headerIndex].InnerText == "phone")
        {
            phoneIndex = headerIndex;
        }
    }

    if (addressIndex != -1 && phoneIndex != -1)
    {
        foreach (var row in table.SelectNodes("//tr"))
        {
            HtmlNode addressNode = row.SelectSingleNode("td[addressIndex]");
            //do something with address here
            HtmlNode phoneNode = row.SelectSingleNode("td[phoneIndex]");
            // do something with phone here
        }
    }
}

There is an example of that in the discussion forums here. Scroll down a bit to see the table answer. I do wish they would provide better samples that were easier to find.

EDIT:
To extract data from specific columns you would have to first find the <th> tags that correspond to the columns you want and remember their indexes. You would then need to find the <td> tags for the same indexes. Assuming you know the indexes of the columns you could do something like this:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://somewhere.com");
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
foreach (var row in table.SelectNodes("//tr"))
{
    HtmlNode addressNode = row.SelectSingleNode("td[2]");
    //do something with address here
    HtmlNode phoneNode = row.SelectSingleNode("td[5]");
    // do something with phone here
}

Edit2:
If you don't know the indexes of the columns you could do the whole thing like this. I have not tested this.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://somewhere.com");
var tables = doc.DocumentNode.SelectNodes("//table");

foreach(var table in tables)
{
    int addressIndex = -1;
    int phoneIndex = -1;
    var headers = table.SelectNodes("//th");
    for (int headerIndex = 0; headerIndex < headers.Count(); headerIndex++)
    {
        if (headers[headerIndex].InnerText == "address")
        {
            addressIndex = headerIndex;
        }
        else if (headers[headerIndex].InnerText == "phone")
        {
            phoneIndex = headerIndex;
        }
    }

    if (addressIndex != -1 && phoneIndex != -1)
    {
        foreach (var row in table.SelectNodes("//tr"))
        {
            HtmlNode addressNode = row.SelectSingleNode("td[addressIndex]");
            //do something with address here
            HtmlNode phoneNode = row.SelectSingleNode("td[phoneIndex]");
            // do something with phone here
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文