HTML 解析没有结果
我正在尝试解析此 HTML 文档以获取航班、时间、出发地、日期和输出的内容。
<div id="FlightInfo_FlightInfoUpdatePanel">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr class="">
<td class="airline"><img src="/images/airline logos/US.gif" title="US AIRWAYS. " alt="US AIRWAYS. " /></td>
<td class="flight">US5316</td>
<td class="codeshare">NZ46</td>
<td class="origin">Rarotonga</td>
<td class="date">02 Sep</td>
<td class="time">10:30</td>
<td class="est">21:30</td>
<td class="status">CHECK IN CLOSING</td>
</tr>
我正在使用此代码,基于 Windows Phone 7 的 HTML Agility Pack 来查找并输出 US5316
的内容,
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
var html = e.Result;
var doc = new HtmlDocument();
doc.LoadHtml(html);
var node = doc.DocumentNode.Descendants("div")
.FirstOrDefault(x => x.Id == "FlightInfo_FlightInfoUpdatePanel")
.Element("table")
.Element("tbody")
.Elements("tr")
.Where(tr => tr.GetAttributeValue("td", "").Contains("class"))
.SelectMany(tr => tr.Descendants("flight"))
.ToArray();
this.scrollViewer1.Content = node;
//Added below
listBox1.itemSource = node;
}
我在这两个方面都没有得到结果ScrollViewer 或列表框。我想知道我使用的 linq 解析对于我提供的 HTML 是否正确?
Am trying to parse this HTML document to get the contents of flight, time, origin, date and output.
<div id="FlightInfo_FlightInfoUpdatePanel">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr class="">
<td class="airline"><img src="/images/airline logos/US.gif" title="US AIRWAYS. " alt="US AIRWAYS. " /></td>
<td class="flight">US5316</td>
<td class="codeshare">NZ46</td>
<td class="origin">Rarotonga</td>
<td class="date">02 Sep</td>
<td class="time">10:30</td>
<td class="est">21:30</td>
<td class="status">CHECK IN CLOSING</td>
</tr>
I am using this code, based on HTML Agility Pack for windows phone 7 to find and output the content of <td class="flight">US5316</td>
void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
var html = e.Result;
var doc = new HtmlDocument();
doc.LoadHtml(html);
var node = doc.DocumentNode.Descendants("div")
.FirstOrDefault(x => x.Id == "FlightInfo_FlightInfoUpdatePanel")
.Element("table")
.Element("tbody")
.Elements("tr")
.Where(tr => tr.GetAttributeValue("td", "").Contains("class"))
.SelectMany(tr => tr.Descendants("flight"))
.ToArray();
this.scrollViewer1.Content = node;
//Added below
listBox1.itemSource = node;
}
I get no results in either the ScrollViewer or the Listbox. I would like to know if the linq parse that I am using is correct for the HTML I supplied?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你打算用这条线做什么?
GetAttributeValue(name, def)
在节点中查找具有键name
的属性,并在找到该属性时返回该属性的值。否则,它返回默认值def
。所以这里实际发生的是
没有任何带有键
td
的属性,所以它返回默认值(一个空字符串),这确实不包含子字符串“class”,因此您的节点被过滤掉。
编辑:
这将返回一个数组,其中每个条目都是包含每个 td 内容的 8 个字符串的数组:
示例:
What do you intend to do with this line?
GetAttributeValue(name, def)
looks for an attribute with the keyname
in the node, and it returns the value of that attribute in case it founds it. Otherwise, it returns the default valuedef
.So what's actually happening here is that
<tr>
doesn't have any attribute with the keytd
, so it's returning the default value (an empty string), which does not contain the substring "class", so your<tr>
node is being filtered out.Edit:
This will return an array where each entry is an array of 8 strings containing the contents of each td:
Examples:
您尝试将 ScrollViewer 的内容设置为
string[]
(数组)。因此,我会重复一遍,并说您应该花一些时间学习基本的 C#,然后再继续这一努力。您需要做的是使用
ListBox
而不是ScrollViewer
,然后将ListBox.ItemSource
设置为您的node< /code> 字符串数组。
You're trying to set the content of a ScrollViewer to a
string[]
(an array). So I'll repeat myself, and say that you should take some time to learn basic C# before you continue this endeavour.What you need to do, is to use a
ListBox
instead of theScrollViewer
and then set theListBox.ItemSource
to yournode
string-array.