通过C#、XmlDocument.LoadXml解析网页

发布于 2024-12-21 20:36:39 字数 507 浏览 1 评论 0原文

我正在尝试下载一个网页并解析它。我需要到达html文档的每个节点。所以我使用WebClient来下载，效果很完美。然后我使用以下代码段来解析文档：

 WebClient client = new WebClient();

 Stream data = client.OpenRead("http://web.cs.hacettepe.edu.tr/~bil339/");
 StreamReader reader = new StreamReader(data);
 string xml = reader.ReadToEnd();

 data.Close();
 reader.Close();
 XmlDocument xmlDoc = new XmlDocument();
 xmlDoc.loadXml(xml);

在最后一行中，程序等待一段时间，然后崩溃。它说 HTML 代码中有错误，这不是预期的，不应该出现在这里，等等。有什么建议来解决这个问题吗？欢迎使用其他解析 HTML 代码的技术（当然是在 C# 中。）

原文

I'm trying to download a web page and parse it. I need to reach every node of html document. So I used WebClient to download, which works perfectly. Then I use following code segment to parse the document:

 WebClient client = new WebClient();

 Stream data = client.OpenRead("http://web.cs.hacettepe.edu.tr/~bil339/");
 StreamReader reader = new StreamReader(data);
 string xml = reader.ReadToEnd();

 data.Close();
 reader.Close();
 XmlDocument xmlDoc = new XmlDocument();
 xmlDoc.loadXml(xml);

In last line, program waits for some time, then crashes. It says there are errors in HTML code, this wasn't expected, that shouldn't be here, etc.
Any suggestions to fix this? Other techniques to parse HTML code are welcome (In C#, of course.)

分享到QQ

分享到微博