链接检查器;如何避免误报
我正在开发一个链接检查器/损坏的链接查找器,我收到了很多误报,经过仔细检查,我注意到许多错误代码返回了 webExceptions,但它们实际上是可下载的,但在其他一些情况下,状态代码是 404 并且我可以从浏览器访问该页面。
这是代码,它非常丑陋,我想有更多的东西,我说实用。所有状态代码都那么大,如果用于过滤我不想添加到断开链接的状态代码,因为它们是有效链接(我测试了它们全部)。我需要修复的是结构(如果可能)以及如何不出现错误 404。
谢谢!
try
{
HttpWebRequest request = ( HttpWebRequest ) WebRequest.Create ( uri );
request.Method = "Head";
request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED
request.AllowAutoRedirect = true;
using ( HttpWebResponse response = ( HttpWebResponse ) request.GetResponse() )
{
request.Abort();
}
/* WebClient wc = new WebClient();
wc.DownloadString( uri ); */
_validlinks.Add ( strUri );
}
catch ( WebException wex )
{
if ( !wex.Message.Contains ( "The remote name could not be resolved:" ) &&
wex.Status != WebExceptionStatus.ServerProtocolViolation )
{
if ( wex.Status != WebExceptionStatus.Timeout )
{
HttpStatusCode code = ( ( HttpWebResponse ) wex.Response ).StatusCode;
if (
code != HttpStatusCode.OK &&
code != HttpStatusCode.BadRequest &&
code != HttpStatusCode.Accepted &&
code != HttpStatusCode.InternalServerError &&
code != HttpStatusCode.Forbidden &&
code != HttpStatusCode.Redirect &&
code != HttpStatusCode.Found
)
{
_brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}
else _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}
I'm working a on a link checker/broken link finder and I am getting many false positives, after double checking I noticed that many error codes were returning webexceptions but they were actually downloadable, but in some other cases the statuscode is 404 and i can access the page from the browse.
So here is the code, its pretty ugly, and id like to have something more, id say practical. All the status codes are in that big if are used to filter the ones i dont want to add to brokenlink because they are valid links ( i tested them all ). What i need to fix is the structure (if possible) and how to not get false 404.
Thank you!
try
{
HttpWebRequest request = ( HttpWebRequest ) WebRequest.Create ( uri );
request.Method = "Head";
request.MaximumResponseHeadersLength = 32; // FOR IE SLOW SPEED
request.AllowAutoRedirect = true;
using ( HttpWebResponse response = ( HttpWebResponse ) request.GetResponse() )
{
request.Abort();
}
/* WebClient wc = new WebClient();
wc.DownloadString( uri ); */
_validlinks.Add ( strUri );
}
catch ( WebException wex )
{
if ( !wex.Message.Contains ( "The remote name could not be resolved:" ) &&
wex.Status != WebExceptionStatus.ServerProtocolViolation )
{
if ( wex.Status != WebExceptionStatus.Timeout )
{
HttpStatusCode code = ( ( HttpWebResponse ) wex.Response ).StatusCode;
if (
code != HttpStatusCode.OK &&
code != HttpStatusCode.BadRequest &&
code != HttpStatusCode.Accepted &&
code != HttpStatusCode.InternalServerError &&
code != HttpStatusCode.Forbidden &&
code != HttpStatusCode.Redirect &&
code != HttpStatusCode.Found
)
{
_brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}
else _brokenlinks.Add ( new Href ( new Uri ( strUri , UriKind.RelativeOrAbsolute ) , UrlType.External ) );
}
else _validlinks.Add ( strUri );
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该添加 UserAgent 标头,因为许多网站都需要它们。
You should add a UserAgent header, since many websites require them.