Python urllib2.urlopen bug:超时错误导致我的互联网连接中断?
我不知道我是否做错了什么,但我 100% 确定是 python 脚本导致了我的互联网连接中断。
我编写了一个Python脚本来抓取数千个文件头信息,主要是为了Content-Length使用HEAD请求来获取每个文件的确切大小。
示例代码:
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
response = urllib2.urlopen(HeadRequest("http://www.google.com"))
print response.info()
运行几个小时后,脚本开始抛出 urlopen 超时错误,从那时起我的互联网连接就断开了。关闭该脚本后,我的互联网连接将始终立即恢复。一开始我以为可能是连接不稳定,但是运行了几次后,发现是脚本的问题。
我不知道为什么,这应该被视为一个错误,对吧?或者我的 ISP 禁止我做这样的事情? (我已经将程序设置为每个请求等待10秒)
顺便说一句,我正在使用VPN网络,这与此有关吗?
I don't know if I'm doing something wrong, but I'm 100% sure it's the python script brings down my Internet connection.
I wrote a python script to scrape thousands of files header info, mainly for Content-Length to get the exact size of each file, using HEAD request.
Sample code:
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
response = urllib2.urlopen(HeadRequest("http://www.google.com"))
print response.info()
The thing is after several hours running, the script starts to throw out urlopen error timed out, and my Internet connection is down from then on. And my Internet connection will always be back on immediately after I close that script. At the beginning I thought it might be the connection not stable, but after several times running, it turned out to be the scripts fault.
I don't know why, this should be considered as a bug, right? Or my ISP banned me for doing such things? (I already set the program to wait 10s each request)
BTW, I'm using VPN network, does it have something to do with this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我猜想您的 ISP 或 VPN 提供商会因为大量可疑流量而限制您,或者您的路由器或 VPN 隧道因半开放连接而堵塞。消费者互联网实际上并不适合蜘蛛类型的活动。
I'd guess that either your ISP or VPN provider is limiting you because of high-volume suspicious traffic, or your router or VPN tunnel is getting clogged up with half-open connections. Consumer internet is REALLY not intended for spider-type activities.
我们甚至无法开始猜测。
您需要在计算机上收集数据并将该数据包含在您的问题中。
换一台电脑。运行你的脚本。另一台电脑的互联网访问是否也被阻止?或者它仍然有效吗?
如果两台计算机都被阻止,则不是您的软件问题,而是您的提供商问题。使用此信息更新您的问题以及您如何获得该信息。
如果只是运行脚本的计算机被停止,那不是你的提供商的问题,而是你的操作系统资源被耗尽。这更难诊断,因为它可能是内存、套接字或文件描述符。通常是它的套接字。
您需要找到一些适合您的操作系统的 ifconfig/ipconfig 诊断软件。您需要更新您的问题以准确说明您正在使用的操作系统。您需要使用此诊断软件来查看有多少打开的套接字使您的系统混乱。
We can't even begin to guess.
You need to gather data on your computer and include that data in your question.
Get another computer. Run your script. Is the other computer's internet access blocked also? Or does it still work?
If both computers are blocked, it's not your software, it's your provider. Update Your Question with this information, and how you got it.
If only the computer running the script is stopped, it's not your provider, it's your OS resources being exhausted. This is harder to diagnose because it could be memory, sockets or file descriptors. Usually its sockets.
You need to find some ifconfig/ipconfig diagnostic software for your operating system. You need to update your question to state exactly what operating system you're using. You need to use this diagnostic software to see how many open sockets are cluttering up your system.