如何抓取 403 禁止的 SNS
我正在用Python编写的爬虫爬行一个SNS,
它工作了很长时间,但几天前,从我的服务器获得的网页是错误403禁止。
我尝试过更改cookie、更改浏览器、更改帐户,但都失败了。
而且似乎被禁止的服务器都在同一网段。
我能做些什么?盗别人ip? = =...
非常感谢
i'm crawling an SNS with crawler written in python
it works for a long time, but few days ago, the webpages got from my severs were ERROR 403 FORBIDDEN.
i tried to change the cookie, change the browser, change the account, but all failed.
and it seems that are the forbidden severs are in the same network segment.
what can i do? steal someone else's ip? = =...
thx a lot
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
看起来您已在该子网的路由器级别被列入黑名单,可能是因为您(或子网中的其他人)违反了使用条款、robots.txt、站点地图中指定的最大抓取频率或类似内容那。
解决方案不是技术性的,而是社会性的:联系网站管理员,适当地道歉,了解您(或您的一位同事)到底做错了什么,令人信服地承诺不再这样做,再次道歉,直到他们删除黑名单。如果您可以向网站管理员提供任何理由为什么,他们应该让您抓取该网站(例如,您的抓取为搜索引擎提供信息,从而为他们带来流量,或类似的东西),那就更好了!-)
Looks like you've been blacklisted at the router level in that subnet, perhaps because you (or somebody else in the subnet) was violating terms of use, robots.txt, max crawling frequency as specified in a site-map, or something like that.
The solution is not technical, but social: contact the webmaster, be properly apologetic, learn what exactly you (or one of your associates) had done wrong, convincingly promise to never do it again, apologize again until they remove the blacklisting. If you can give that webmaster any reason why they should want to let you crawl that site (e.g., your crawling feeds a search engine that will bring them traffic, or something like this), so much the better!-)