Python urllib2,如何避免错误 - 需要帮助

发布于 2024-10-04 21:43:53 字数 1081 浏览 0 评论 0原文

我正在使用 python urllib2 从网络下载页面。我没有使用任何类型的 user_agent 等。我收到以下示例错误。谁能告诉我一个简单的方法来避免它们。

http://www.rottentomatoes.com/m/foxy_brown/
The server couldn't fulfill the request.
Error code:  403


http://www.spiritus-temporis.com/marc-platt-dancer-/
The server couldn't fulfill the request.
Error code:  503

http://www.golf-equipment-guide.com/news/Mark-Nichols-(golfer).html!!
The server couldn't fulfill the request.
Error code:  500


http://www.ehx.com/blog/mike-matthews-in-fuzz-documentary!!
We failed to reach a server.
Reason:  timed out
IncompleteRead(5621 bytes read)
Traceback (most recent call last):
    File "download.py", line 43, in <module>
    localFile.write(response.read())
    File "/usr/lib/python2.6/socket.py", line 327, in read
    data = self._sock.recv(rbufsize)
    File "/usr/lib/python2.6/httplib.py", line 517, in read
    return self._read_chunked(amt)
    File "/usr/lib/python2.6/httplib.py", line 563, in _read_chunked
    raise IncompleteRead(value)
IncompleteRead: IncompleteRead(5621 bytes read)

谢谢
巴拉

I am using python urllib2 to download pages from the web. I am not using any kind of user_agent etc. I am getting below sample errors. Can someone tell me a easy way to avoid them.

http://www.rottentomatoes.com/m/foxy_brown/
The server couldn't fulfill the request.
Error code:  403


http://www.spiritus-temporis.com/marc-platt-dancer-/
The server couldn't fulfill the request.
Error code:  503

http://www.golf-equipment-guide.com/news/Mark-Nichols-(golfer).html!!
The server couldn't fulfill the request.
Error code:  500


http://www.ehx.com/blog/mike-matthews-in-fuzz-documentary!!
We failed to reach a server.
Reason:  timed out
IncompleteRead(5621 bytes read)
Traceback (most recent call last):
    File "download.py", line 43, in <module>
    localFile.write(response.read())
    File "/usr/lib/python2.6/socket.py", line 327, in read
    data = self._sock.recv(rbufsize)
    File "/usr/lib/python2.6/httplib.py", line 517, in read
    return self._read_chunked(amt)
    File "/usr/lib/python2.6/httplib.py", line 563, in _read_chunked
    raise IncompleteRead(value)
IncompleteRead: IncompleteRead(5621 bytes read)

Thank you
Bala

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

白昼 2024-10-11 21:43:54

许多 Web 资源需要某种 cookie 或其他身份验证才能访问,您的 403 状态代码很可能就是这种情况造成的。

503 错误往往意味着您正在循环中快速从服务器访问资源,并且您需要短暂等待才能尝试另一次访问。

500的例子根本就不存在……

超时错误可能不需要“!!”,我只能加载没有它的资源。

我建议您阅读 http 状态代码。

Many web resources require some kind of cookie or other authentication to access, your 403 status codes are most likely the result of this.

503 errors tend to mean you're rapidly accessing resources from a server in a loop and you need to wait briefly before attempting another access.

The 500 example doesn't even appear to exist...

The timeout error may not need the "!!", I can only load the resource without it.

I recommend you read up on http status codes.

忘羡 2024-10-11 21:43:54

对于那些更复杂的任务,您可能需要考虑使用 mechanize、twill 甚至 Selenium 或 Windmill,它们将支持更复杂的场景,包括 cookie 或 javascript 支持。

对于随机网站,仅使用 urllib2 来解决可能会很棘手(签名的 cookie,有人吗?)。

For those more complicated tasks, You might want to consider using mechanize, twill or even Selenium or Windmill, which will support more compliated scenerios, including cookies or javascript support.

For random website, it might be tricky to work around with urllib2 only (signed cookies, anyone?).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文