当前位置：文江博客话题详情

从 URL 下载返回的 Zip 文件

发布于 2025-01-08 08:01:23 字数 81 浏览 1 评论 0原文

如果我有一个 URL，当在 Web 浏览器中提交时，会弹出一个对话框来保存 zip 文件，我将如何在 Python 中捕获并下载这个 zip 文件？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑脸一如从前 2025-01-15 08:01:23

据我所知，在 Python 2 中执行此操作的正确方法是：

import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

当然，您需要使用 r.ok 检查 GET 是否成功。

对于 python 3+，将 StringIO 模块添加到 io 模块并使用 BytesIO 代替StringIO：这里是提及此更改的发行说明。

import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")

As far as I can tell, the proper way to do this in Python 2 is:

import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

of course you'd want to check that the GET was successful with r.ok.

For python 3+, sub the StringIO module with the io module and use BytesIO instead of StringIO: Here are release notes that mention this change.

import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")

回复收藏 0 原文

南笙 2025-01-15 08:01:23

大多数人建议使用 requests（如果可用），并且 requests 文档建议使用此方法从网址下载和保存原始数据：

import requests 

def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)

由于答案询问如何下载并保存 zip 文件，我还没有进入有关读取 zip 文件的详细信息。请参阅下面的众多答案之一以了解可能性。

如果由于某种原因您无法访问requests，您可以使用urllib.request。它可能不如上面的那么强大。

import urllib.request

def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

最后，如果您仍在使用 Python 2，则可以使用 urllib2.urlopen。

from contextlib import closing

def download_url(url, save_path):
    with closing(urllib2.urlopen(url)) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

Most people recommend using requests if it is available, and the requests documentation recommends this for downloading and saving raw data from a url:

import requests 

def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)

Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.

If for some reason you don't have access to requests, you can use urllib.request instead. It may not be quite as robust as the above.

import urllib.request

def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

Finally, if you are using Python 2 still, you can use urllib2.urlopen.

from contextlib import closing

def download_url(url, save_path):
    with closing(urllib2.urlopen(url)) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

回复收藏 0 原文

貪欢 2025-01-15 08:01:23

在这篇博文，我已经让它只处理请求。
奇怪的 stream 事情的要点是我们不需要调用 content
对于大型请求，这需要立即处理所有请求，
堵塞内存。 stream 通过迭代数据来避免这种情况
一次一大块。

url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'

response = requests.get(url, stream=True)
with open('alaska.zip', "wb") as f:
    for chunk in response.iter_content(chunk_size=512):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)

With the help of this blog post, I've got it working with just requests.
The point of the weird stream thing is so we don't need to call content
on large requests, which would require it to all be processed at once,
clogging the memory. The stream avoids this by iterating through the data
one chunk at a time.

url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'

response = requests.get(url, stream=True)
with open('alaska.zip', "wb") as f:
    for chunk in response.iter_content(chunk_size=512):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)

回复收藏 0 原文

最近可好 2025-01-15 08:01:23

以下是我在 Python 3 中要做的工作：

import zipfile, urllib.request, shutil

url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'

with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)
    with zipfile.ZipFile(file_name) as zf:
        zf.extractall()

Here's what I got to work in Python 3:

import zipfile, urllib.request, shutil

url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'

with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)
    with zipfile.ZipFile(file_name) as zf:
        zf.extractall()

回复收藏 0 原文

过潦 2025-01-15 08:01:23

将 .zip 文件保存到磁盘上某个位置的超轻量级解决方案（使用 Python 3.9）：

import requests

url = r'https://linktofile'
output = r'C:\pathtofolder\downloaded_file.zip'

r = requests.get(url)
with open(output, 'wb') as f:
    f.write(r.content)

Super lightweight solution to save a .zip file to a location on disk (using Python 3.9):

import requests

url = r'https://linktofile'
output = r'C:\pathtofolder\downloaded_file.zip'

r = requests.get(url)
with open(output, 'wb') as f:
    f.write(r.content)

回复收藏 0 原文

浅紫色的梦幻 2025-01-15 08:01:23

我来这里寻找如何保存 .bzip2 文件。让我将代码粘贴给其他可能会寻找此内容的人。

url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"

response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
   f.write(response.content)

我只想按原样保存文件。

I came here searching how to save a .bzip2 file. Let me paste the code for others who might come looking for this.

url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"

response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
   f.write(response.content)

I just wanted to save the file as is.

回复收藏 0 原文

勿忘初心 2025-01-15 08:01:23

要么使用 urllib2.urlopen，要么你可以尝试使用优秀的 Requests模块并避免 urllib2 的麻烦：

import requests
results = requests.get('url')
#pass results.content onto secondary processing...

Either use urllib2.urlopen, or you could try using the excellent Requests module and avoid urllib2 headaches:

import requests
results = requests.get('url')
#pass results.content onto secondary processing...

回复收藏 0 原文

司马昭之心 2025-01-15 08:01:23

感谢@yoavram提供上述解决方案，
我的 url 路径链接到压缩文件夹，并遇到 BADZipfile 错误
（文件不是zip文件），如果我尝试了几次就很奇怪
检索网址并突然解压缩，所以我稍微修改了解决方案
少量。按照此处使用 is_zipfile 方法一个>

r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
    r = requests.get(url, stream =True)
    check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()

Thanks to @yoavram for the above solution,
my url path linked to a zipped folder, and encounter an error of BADZipfile
(file is not a zip file), and it was strange if I tried several times it
retrieve the url and unzipped it all of sudden so I amend the solution a little
bit. using the is_zipfile method as per here

r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
    r = requests.get(url, stream =True)
    check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()

回复收藏 0 原文

清风疏影 2025-01-15 08:01:23

使用 `requests、zipfile 和 io` python 包。

特别是 BytesIO 函数用于将解压后的文件保留在内存中，而不是将其保存到驱动器中。

import requests
from zipfile import ZipFile
from io import BytesIO

r = requests.get(zip_file_url)
z = ZipFile(BytesIO(r.content))    
file = z.extract(a_file_to_extract, path_to_save)
with open(file) as f:
    print(f.read())

Use `requests, zipfile and io` python packages.

Specially BytesIO function is used to keep the unzipped file in memory rather than saving it into the drive.

import requests
from zipfile import ZipFile
from io import BytesIO

r = requests.get(zip_file_url)
z = ZipFile(BytesIO(r.content))    
file = z.extract(a_file_to_extract, path_to_save)
with open(file) as f:
    print(f.read())

回复收藏 0 原文

~没有更多了~