HTML敏捷解析

发布于 2024-12-02 07:47:22 字数 1799 浏览 1 评论 0原文

我想解析 HTML 表并使用 XML to LINQ 在绑定列表框中显示内容。

我正在使用 HTML Agility pack 并使用此代码。

    HtmlWeb web = new HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");
    HtmlNode rateNode = doc.DocumentNode.SelectSingleNode("//div[@id='FlightInfo_FlightInfoUpdatePanel']");
    string rate = rateNode.InnerText;
    this.richTextBox1.Text = rate;

HTML 看起来像这样..

<div id="FlightInfo_FlightInfoUpdatePanel">

   <table cellspacing="0" cellpadding="0"><tbody>
     <tr class="">
     <td class="airline"><img src="/images/airline logos/NZ.gif" title="AIR NEW ZEALAND LIMITED. " alt="AIR NEW ZEALAND LIMITED. " /></td>
     <td class="flight">NZ8</td>
     <td class="codeshare">&nbsp;</td>
     <td class="origin">San Francisco</td>
     <td class="date">01 Sep</td>
     <td class="time">17:15</td>
     <td class="est">18:00</td>
     <td class="status">DEPARTED</td>
     </tr>

但它返回的是 this

NZ8&nbsp;San Francisco01 Sep17:1518:00DEPARTEDAC6103NZ8San Francisco01 Sep17:1518:00DEPARTEDCO6754NZ8San Francisco01 Sep17:1518:00DEPARTEDLH7157NZ8San Francisco01 Sep17:1518:00DEPARTEDUA6754NZ8San Francisco01 Sep17:1518:00DEPARTEDUS5308NZ8San Francisco01 Sep17:1518:00DEPARTEDVS7408NZ8San Francisco01 Sep17:1518:00DEPARTEDEK407&nbsp;Melbourne/Dubai01 Sep17:5017:50DEPARTEDEK413&nbsp;Sydney/Dubai01 Sep18:0018:00DEPARTEDQF44&nbsp;Sydney01 

我想要的是将其解析为 XML 格式,然后使用 LINQ to XML 将 XML 解析为绑定的列表框项源。

我想我需要为每个课程使用以下内容的变体,但需要一些帮助。

HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='flight']");

I would like to parse an HTML table and disaply contents using XML to LINQ in an bound listbox.

I am using HTML Agility pack and using this code.

    HtmlWeb web = new HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");
    HtmlNode rateNode = doc.DocumentNode.SelectSingleNode("//div[@id='FlightInfo_FlightInfoUpdatePanel']");
    string rate = rateNode.InnerText;
    this.richTextBox1.Text = rate;

The HTML looks like this..

<div id="FlightInfo_FlightInfoUpdatePanel">

   <table cellspacing="0" cellpadding="0"><tbody>
     <tr class="">
     <td class="airline"><img src="/images/airline logos/NZ.gif" title="AIR NEW ZEALAND LIMITED. " alt="AIR NEW ZEALAND LIMITED. " /></td>
     <td class="flight">NZ8</td>
     <td class="codeshare"> </td>
     <td class="origin">San Francisco</td>
     <td class="date">01 Sep</td>
     <td class="time">17:15</td>
     <td class="est">18:00</td>
     <td class="status">DEPARTED</td>
     </tr>

But it is returning this

NZ8 San Francisco01 Sep17:1518:00DEPARTEDAC6103NZ8San Francisco01 Sep17:1518:00DEPARTEDCO6754NZ8San Francisco01 Sep17:1518:00DEPARTEDLH7157NZ8San Francisco01 Sep17:1518:00DEPARTEDUA6754NZ8San Francisco01 Sep17:1518:00DEPARTEDUS5308NZ8San Francisco01 Sep17:1518:00DEPARTEDVS7408NZ8San Francisco01 Sep17:1518:00DEPARTEDEK407 Melbourne/Dubai01 Sep17:5017:50DEPARTEDEK413 Sydney/Dubai01 Sep18:0018:00DEPARTEDQF44 Sydney01 

What I would like is pasrse this to XML format and then use LINQ to XML to parse the XML to a bound listbox itemsource.

I am thinking I need to use a variation of the below for each class, but would like some help.

HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='flight']");

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

嘿咻 2024-12-09 07:47:22

您正在使用 InnerText 来删除 HTML。

使用 InnerHtml

string rate = rateNode.InnerHtml;

您可以从此字符串创建 XML 文档(假设它是有效的 XML)。

您还可以按照检索 rateNode 的相同方式查询它 - 选择其子节点:

var firstRow = rateNode.SelectSingleNode("./table/tbody/tr[0]");
string origin = firstRow.SelectSingleNode("./td[@class = 'origin']");

You are using InnerText which strips out the HTML.

Use InnerHtml:

string rate = rateNode.InnerHtml;

You can create an XML document from this string (assuming it is valid XML).

You can also query the rateNode in the same way you retrieved it - selecting its child nodes:

var firstRow = rateNode.SelectSingleNode("./table/tbody/tr[0]");
string origin = firstRow.SelectSingleNode("./td[@class = 'origin']");
别把无礼当个性 2024-12-09 07:47:22

如果您想使用 linq to xml,您可以将 HtmlDocument 转换为 xml 字符串:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");  
doc.OptionOutputAsXml = true;
System.IO.StringWriter sw = new System.IO.StringWriter();
System.Xml.XmlTextWriter xw = new System.Xml.XmlTextWriter(sw);
doc.Save(xw);
string result = sw.ToString();

然后您只需创建一个 XDocument 对象并加载 xml 字符串:

System.Xml.Linq.XDocument xDoc = System.Xml.Linq.XDocument.Parse(result);

现在您就有了一个可以使用 Linq 的 XDocument。

If you want to work with linq to xml, you can transform the HtmlDocument to a xml string:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");  
doc.OptionOutputAsXml = true;
System.IO.StringWriter sw = new System.IO.StringWriter();
System.Xml.XmlTextWriter xw = new System.Xml.XmlTextWriter(sw);
doc.Save(xw);
string result = sw.ToString();

Then you only need create an XDocument objet and load with the xml string:

System.Xml.Linq.XDocument xDoc = System.Xml.Linq.XDocument.Parse(result);

And now you have a XDocument to play with Linq.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文