在 Python 中使用 urllib2。如何获取正在下载的文件的名称?
我是一个Python初学者。我正在使用 urllib2 下载文件。下载文件时,我指定一个文件名,用于将下载的文件保存在硬盘上。但是,如果我使用浏览器下载文件,则会自动提供默认文件名。
这是我的代码的简化版本:
def downloadmp3(url):
webFile = urllib2.urlopen(url)
filename = 'temp.zip'
localFile = open(filename, 'w')
localFile.write(webFile.read())
文件下载得很好,但是如果我将变量“url”中存储的字符串输入到浏览器中,则下载时会为该文件指定一个默认文件名。我想将此文件名用于我下载的文件,而不是“temp.zip”或我指定的任何文件名。
如何使用 urllib2 (或其他一些 Python 库)以我下载的服务器想要的文件名保存文件?
如果有人不明白这个问题,请说出来,以便我可以尽力说得更清楚。
I am a python beginner. I am using urllib2 to download files. When I download a file, I specify a filename to with which to save the downloaded file on my hard drive. However, if I download the file using my browser, a default filename is automatically provided.
Here is a simplified version of my code:
def downloadmp3(url):
webFile = urllib2.urlopen(url)
filename = 'temp.zip'
localFile = open(filename, 'w')
localFile.write(webFile.read())
The file downloads just fine, but if I type the string stored in the variable "url" into my browser, there is a default filename given to the file when I download it. I want to use this filename for my downloaded file not 'temp.zip' or whatever I assign it.
How do I use urllib2 (or some other Python library) to save the file with the filename that the server I am downloading from intends it to have?
If anyone doesn't understand this question, please say so, so that I can try to make it clearer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
文件名通常由服务器通过 content-disposition 标头包含:
访问标头
您可以通过查看
http://docs .python.org/library/urllib2.html
但请注意,此标头不需要存在。否则,您需要根据请求的 URL 自行生成合理的名称 - 例如,根据 URI 的最后一个组成部分。
在这种情况下,请使用 Python 的 urlparse() 方法。
The filename is usually included by the server through the content-disposition header:
You have access to the headers through
See
http://docs.python.org/library/urllib2.html
But be aware that this header does not need to be present. Otherwise you need to generate a reasonable name yourself from the URL requested - e.g. from the last component of the URI.
Use the urlparse() method of Python in this case.
我对之前答案的问题是他们使用的是原始 URL,在重定向的情况下会失败。我是这样做的:(注意使用
result.url
而不是url
)My issue with the previous answers is that they were using the original URL, and that would fail in the case of a redirect. Here's how I do it: (note the use of
result.url
instead ofurl
)您可以使用 urlretrieve 来做到这一点:
http://docs.python.org/library/urllib.html< /a>
You can do that using urlretrieve :
http://docs.python.org/library/urllib.html
我遇到了一个问题,服务器没有给我任何
content-disposition
标头,因此如果这也是您的情况,您可以像这样从 url 中提取文件名:在我的情况下,我使用
file_stream.headers .subtype
其中包含文件扩展名,我根据 django 的模型 slug 重命名了文件,这是一个示例:最后一行是使用 django 的 save 方法保存文件,还通过在末尾添加随机字符来处理重复的文件名:)
太棒了。
I had an issue where server did not give me any
content-disposition
header so if it's also your case, you can extract filename from url like this:In my case, I used
file_stream.headers.subtype
which contained file extension and I renamed files based on my django's model slug, here's an example:Last line is saving file using django's save method, also handling duplicated file names by adding random characters at the end :)
Awesome.