使用 Python Urllib、Urllib2 下载文件

发布于 2024-10-14 00:49:43 字数 687 浏览 9 评论 0原文

我正在尝试使用 urllib 从网站下载文件，如以下线程中所述：链接文本

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

我能够下载文件（主要是 pdf），但我得到的只是无法打开的损坏文件。我怀疑这是因为该网站需要登录。

如何修改上述函数来处理cookies？我已经知道带有用户名和名称的表单字段的名称。密码信息。当我打印 urlretrieve 的返回值时，我收到如下消息：

a, b = urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
print a, b

>> **cache-control:** no-cache, no-store, must-revalidate, s-maxage=300, proxy-revalida
te

>> **connection:** close

如果我在浏览器中输入文件的网址，我就可以手动下载文件。谢谢

原文

I am trying to download files from a website using urllib as described in this thread: link text

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

I am able to download the files (mostly pdf) but all I get is corrupted files that cannot open. I suspect it's because the website requires a login.

How can the above function be modified to handle cookies? I already know the names of the form fields that carry the username & password information. When I print the return values of urlretrieve I get messages like:

a, b = urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
print a, b

>> **cache-control:** no-cache, no-store, must-revalidate, s-maxage=300, proxy-revalida
te

>> **connection:** close

I am able to manually download the files if I enter their urls in the browser. Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

顾铮苏瑾 2024-10-21 00:49:43

首先 urllib2 实际上支持 cookie 并且 cookie 处理应该很容易，其次你可以检查什么您下载的文件类型。例如，据我所知，所有 mp3 都以字节“ID3”开头

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

First urllib2 actually supports cookies and cookie handling should be easy, second of all you can check what kind of file you have downloaded. E.g. AFAIK all mp3 starts with the bytes "ID3"

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

回复收藏 0 原文