当标头设置为 HTTP/1.0 404 Not Found 时,如何在 C# 中获取页面的 HTML
即使标头设置为 404,有什么方法可以获取网页的 html?有些页面上仍然有文字,就我而言,我需要阅读该文字。
用于获取 HTML 的 C# 代码示例:
public static string GetHtmlFromUri(string resource)
{
string html = string.Empty;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(resource); //Errors here.
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
bool isSuccess = (int)resp.StatusCode < 299 && (int)resp.StatusCode >= 200;
if (isSuccess)
{
using (StreamReader reader = new StreamReader(resp.GetResponseStream()))
{
html = reader.ReadToEnd();
}
}
}
return html;
}
这是我创建的一个页面,用于测试 404 错误: http ://bypass.rd.to/headertest.php
如果你查看标题,你会发现它是 404,但可以读取文本。现在尝试用 C# 获取页面...
MessageBox.Show(GetHtmlFromUri("http://bypass.rd.to/headertest.php"));
System.Net.WebException 未处理
Message="远程服务器返回错误:(404) 未找到。"
来源=“系统”
StackTrace:位于 System.Net.HttpWebRequest.GetResponse()
Any way to get the html of a webpage even when the header is set to 404? Some pages still have text on them, and in my case I need to read that text.
Example C# code for getting HTML:
public static string GetHtmlFromUri(string resource)
{
string html = string.Empty;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(resource); //Errors here.
using (HttpWebResponse resp = (HttpWebResponse)req.GetResponse())
{
bool isSuccess = (int)resp.StatusCode < 299 && (int)resp.StatusCode >= 200;
if (isSuccess)
{
using (StreamReader reader = new StreamReader(resp.GetResponseStream()))
{
html = reader.ReadToEnd();
}
}
}
return html;
}
And here is a page that i've created to test this with 404 errors: http://bypass.rd.to/headertest.php
If you look in the header, you will see that it is a 404, but text can be read. Now try to get the page in C#...
MessageBox.Show(GetHtmlFromUri("http://bypass.rd.to/headertest.php"));
System.Net.WebException was unhandled
Message="The remote server returned an error: (404) Not Found."
Source="System"
StackTrace: at System.Net.HttpWebRequest.GetResponse()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
该异常包含
HttpWebResponse
,您可以从中访问发回的所有内容。请参阅此答案 举个例子。The exception contains the
HttpWebResponse
from which you can access everything that was sent back. See this answer for an example.