为什么308重定向不会像301/302一样自动重定向到位置?
我有一个机器人可以刮擦某些数据,有时它们会弄错页面,例如两次不同的时间,它们将拥有不同的项目,但相同的主要信息日期/时间/地点等。
我在第一个爬网上记录所有URL但是今晚,当我重新运行而不是网站时,我正在处理的错误消息,我正在处理的错误消息,它正在返回WebException 308永久重定向,而不是像我的机器人在301/302上那样实际关注它重定向。
我以前从未遇到过308,但认为他们会像301/302一样处理,然后将我的机器人带到这个新位置,但这是一个错误。
我是否应该尝试处理308,将其从数据集中删除并继续进行。 由于时代是如此近的时间,并在记录集中设置一个标志,并且要警惕爬行这些页面,而不是308出现,而只是忽略它。 或其他一些方法。
我只是不是为什么它不会像301/302一样自动重定向。
该机器人位于C#中,并且具有数千行代码 - 如果您要示例,则在此框中将其价值1%。
这只是一个普通的WebRequest获取响应,状态,代码,检查错误和状态代码,并返回HTML或错误,或者如果设置为,则在停止之前使用不同的随机代理/参考器/Useragent等回归。
I have a BOT that scrapes certain data and sometimes they get their pages wrong e.g for two different times they will have a different no of items but the same main info date/time/place etc.
I log all of the URLS on a first crawl but tonight when I was re-running instead of the site just leaving those incorrect pages with an error message which I was handling it was returning WebException 308 Permanent Redirect and not actually following it as my BOT does on 301/302 redirects.
I've never come across 308's before but thought they would handle just like 301/302 and take my BOT to that new location but it's an error instead.
Should I try and handle the 308, remove it from my dataset and carry on.
Set a flag in the recordset due to the times being so close together and be wary of crawling those pages incase a 308 comes up and just ignore it.
Or some other method.
I just don't why it wouldn't redirect automatically like 301/302s.
The BOT is in C# and has thousands of lines of code - too much to put 1% worth in this box if you're asking for examples.
It's just a normal WebRequest get the response, status, code, check for errors, and status codes, and returns the HTML or the error, or if set to, retries with a different random proxy/referer/useragent etc before stopping.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论