需要 urllib.urlretrieve 和 urllib2.OpenerDirector 一起使用
我正在用 Python 2.7 编写一个脚本,它通过 urllib2.build_opener() 来使用 urllib2.OpenerDirector 实例来利用 urllib2.HTTPCookieProcessor > 类,因为我需要存储并重新发送我得到的 cookie:
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
但是,在发出多个请求并移动 cookie 后,最终我需要检索 URL 列表。我想使用 urllib.urlretrieve() ,因为我读到它会分块下载文件,但我不能,因为我需要在请求中携带我的 cookie,并且 urllib.urlretrieve()
使用 urllib.URLOpener
,它不支持像 OpenerDirector
这样的 cookie 处理程序。
这种奇怪的功能分割方式的原因是什么?我怎样才能实现我的目标?
I'm writing a script in Python 2.7 which uses a urllib2.OpenerDirector
instance via urllib2.build_opener()
to take advantage of the urllib2.HTTPCookieProcessor
class, because I need to store and re-send the cookies I get:
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
However, after making several requests and moving the cookies around, eventually I need to retrieve a list of URLs. I wanted to use urllib.urlretrieve()
because I read it downloads the file in chunks, but I cannot because I need to carry my cookies on the request andurllib.urlretrieve()
uses a urllib.URLOpener
, which doesn't have support for cookie handlers like OpenerDirector
has.
What's the reason of this strange way of splitting functionality, and how can I achieve my goal?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
urlretrieve
是urllib
的旧接口。它早在 urllib2 出现之前就已经存在了。它没有任何会话处理功能。它只是下载文件。更新后的 urllib2 使用其 Handler 接口 OpenerDirector 类提供了更好的处理会话、密码、代理的方法。为了将 url 下载为文件,您可以使用您创建的相同请求对象来使用 urllib2 的 urlopen 调用。这将维持会话。urlretrieve
is a old interface fromurllib
. It was there much before urllib2 came into existence. It does not have any session handling capabilities. It just downloads the files. The updatedurllib2
provides much better way with the deal with sessions, passwords, proxies extra using its Handler interfaces OpenerDirector class. In order to just download the urls as files, you may just use the urlopen call of urllib2 using the same request object that you created. This will maintain the session.