Html Agility Pack 合并查询

发布于 2024-12-06 09:31:12 字数 1210 浏览 1 评论 0原文

我有一个表:

...some td's with not needed links
<td>1010</td>
<td>Building</td>
<td>Adress stree 55</td>
<td>00000 City</td>
<td>
<a href="http://www.adress.xy/file.kml" target="_self">
<img align="top" border="1" src="/custom/img/kml.gif" alt="Details" title="Details" />
</a>
</td>

我使用此查询来获取内部文本信息:

HtmlDocumet doc = new HtmlDocument();
        doc.LoadHtml(html);            
        var node = doc.DocumentNode.Descendants("table")
            .FirstOrDefault(x => x.Attributes["style"].Value == "table-layout:auto")
            .Elements("tr")
            .Select(tr => tr.Elements("td").Select(td => td.InnerText).ToArray)).ToArray();

但我还想向数组添加一个带有 .kml 链接的 url。 所以问题是:如何合并查询以获得内部文本和 kml 链接?

该查询的结果是:

string[i][j]

其中 i= tr- 元素的数量,j - td- 元素的数量

示例:

string[0][0]="1010"
string[0][1]="Building"

我还希望有: string[i][4] = "http://www. adress.xy/file.kml"

PS 整张桌子都在这里。

I have a table kind of:

...some td's with not needed links
<td>1010</td>
<td>Building</td>
<td>Adress stree 55</td>
<td>00000 City</td>
<td>
<a href="http://www.adress.xy/file.kml" target="_self">
<img align="top" border="1" src="/custom/img/kml.gif" alt="Details" title="Details" />
</a>
</td>

I use this query to get the innertext information:

HtmlDocumet doc = new HtmlDocument();
        doc.LoadHtml(html);            
        var node = doc.DocumentNode.Descendants("table")
            .FirstOrDefault(x => x.Attributes["style"].Value == "table-layout:auto")
            .Elements("tr")
            .Select(tr => tr.Elements("td").Select(td => td.InnerText).ToArray)).ToArray();

but I would also like to add to the array an url with .kml links.
So the question is: how is it possible to merge querys to get innertext and the kml link?

the result of this query is:

string[i][j]

where i= number of tr- elements and j - number of td- elements

Example:

string[0][0]="1010"
string[0][1]="Building"

I would like also to have: string[i][4] = "http://www.adress.xy/file.kml"

P.S. the whole table is here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟雨扶苏 2024-12-13 09:31:12

我不担心获取数组的数组,如果你得到列表会更好。

const string url = "http://www.rwth-aachen.de/go/id/yvu/scol/1/sasc/1/pl/313";
const string kml = "http://www.adress.xy/file.kml";
var newKml = new[] { kml };

var web = new HtmlWeb();
var doc = web.Load(url);
var xpath = "//table[@style='table-layout:auto']/tr[td]";
var rows = doc.DocumentNode.SelectNodes(xpath);
var table = rows
    .Select(row =>
        row.Elements("td")
           .Skip(1)
           .Take(4)
           .Select(col => System.Net.WebUtility.HtmlDecode(col.InnerText))
           .Concat(newKml)
           .ToList()
    ).ToList();

我会考虑创建一个匿名类型来表示您的行,这样您就可以为您的列提供更有用的名称。也许甚至可以将结果放入 DataTable 中。

以防万一您因某种原因无法使用 xpath(或者您想知道等效的 LINQ 查询),您可以将使用 xpath 的行替换为:

var rows = doc.DocumentNode.Descendants("table")
    .Where(t => t.Attributes["style"].Value == "table-layout:auto")
    .SelectMany(t => t.Elements("tr").Where(tr => tr.Elements("td").Any()));

I wouldn't worry about getting arrays of arrays, it would be better if you got lists instead.

const string url = "http://www.rwth-aachen.de/go/id/yvu/scol/1/sasc/1/pl/313";
const string kml = "http://www.adress.xy/file.kml";
var newKml = new[] { kml };

var web = new HtmlWeb();
var doc = web.Load(url);
var xpath = "//table[@style='table-layout:auto']/tr[td]";
var rows = doc.DocumentNode.SelectNodes(xpath);
var table = rows
    .Select(row =>
        row.Elements("td")
           .Skip(1)
           .Take(4)
           .Select(col => System.Net.WebUtility.HtmlDecode(col.InnerText))
           .Concat(newKml)
           .ToList()
    ).ToList();

I would consider making an anonymous type to represent your rows that way you could give more useful names you your columns. Perhaps even put the results in a DataTable instead.

Just in case you won't be able to use xpath for whatever reason (or you wanted to know the equivalent LINQ queries), you could replace the line that uses the xpath with this:

var rows = doc.DocumentNode.Descendants("table")
    .Where(t => t.Attributes["style"].Value == "table-layout:auto")
    .SelectMany(t => t.Elements("tr").Where(tr => tr.Elements("td").Any()));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文