创建 zip 存档以供即时下载

发布于 2024-07-24 01:02:32 字数 652 浏览 6 评论 0原文

在我正在开发的 Web 应用程序中,用户可以创建一个充满文件的文件夹的 zip 存档。 代码如下:

files = torrent[0].files
    zipfile = z.ZipFile(zipname, 'w')
    output = ""

    for f in files:
        zipfile.write(settings.PYRAT_TRANSMISSION_DOWNLOAD_DIR + "/" + f.name, f.name)

downloadurl = settings.PYRAT_DOWNLOAD_BASE_URL + "/" + settings.PYRAT_ARCHIVE_DIR + "/" + filename
output = "Download <a href=\"" + downloadurl + "\">" + torrent_name + "</a>"
return HttpResponse(output)

但这会带来令人讨厌的副作用,即下载 zip 存档时需要长时间等待(10 秒以上)。 可以跳过这个吗? 是否可以将存档直接发送给用户,而不是将其保存到文件中?

我确实相信 torrentflux 提供了我正在谈论的这个 excat 功能。 能够压缩 GB 的数据并在一秒钟内下载。

In a web app I am working on, the user can create a zip archive of a folder full of files. Here here's the code:

files = torrent[0].files
    zipfile = z.ZipFile(zipname, 'w')
    output = ""

    for f in files:
        zipfile.write(settings.PYRAT_TRANSMISSION_DOWNLOAD_DIR + "/" + f.name, f.name)

downloadurl = settings.PYRAT_DOWNLOAD_BASE_URL + "/" + settings.PYRAT_ARCHIVE_DIR + "/" + filename
output = "Download <a href=\"" + downloadurl + "\">" + torrent_name + "</a>"
return HttpResponse(output)

But this has the nasty side effect of a long wait (10+ seconds) while the zip archive is being downloaded. Is it possible to skip this? Instead of saving the archive to a file, is it possible to send it straight to the user?

I do beleive that torrentflux provides this excat feature I am talking about. Being able to zip GBs of data and download it within a second.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

给妤﹃绝世温柔 2024-07-31 01:02:32

正如 mandrake 所说,HttpResponse 的构造函数接受可迭代对象。

幸运的是,ZIP 格式可以一次性创建存档,中央目录记录位于文件的最末尾:

enter图片描述在这里

(图片来自维基百科

幸运的是,zipfile 确实不会执行任何搜索。

这是我想出的代码。 一些注意事项:

  • 我使用此代码来压缩一堆 JPEG 图片。 压缩它们是没有意义的,我仅使用 ZIP 作为容器。
  • 内存使用量为 O(size_of_largest_file) 而不是 O(size_of_archive)。 这对我来说已经足够了:许多相对较小的文件加起来可能会形成巨大的存档
  • 此代码没有设置 Content-Length 标头,因此用户无法获得良好的进度指示。 如果已知所有文件的大小,应该可以提前计算
  • 像这样直接向用户提供 ZIP 意味着恢复下载将不起作用。

所以,这里是:

import zipfile

class ZipBuffer(object):
    """ A file-like object for zipfile.ZipFile to write into. """

    def __init__(self):
        self.data = []
        self.pos = 0

    def write(self, data):
        self.data.append(data)
        self.pos += len(data)

    def tell(self):
        # zipfile calls this so we need it
        return self.pos

    def flush(self):
        # zipfile calls this so we need it
        pass

    def get_and_clear(self):
        result = self.data
        self.data = []
        return result

def generate_zipped_stream():
    sink = ZipBuffer()
    archive = zipfile.ZipFile(sink, "w")
    for filename in ["file1.txt", "file2.txt"]:
        archive.writestr(filename, "contents of file here")
        for chunk in sink.get_and_clear():
            yield chunk

    archive.close()
    # close() generates some more data, so we yield that too
    for chunk in sink.get_and_clear():
        yield chunk

def my_django_view(request):
    response = HttpResponse(generate_zipped_stream(), mimetype="application/zip")
    response['Content-Disposition'] = 'attachment; filename=archive.zip'
    return response

As mandrake says, constructor of HttpResponse accepts iterable objects.

Luckily, ZIP format is such that archive can be created in single pass, central directory record is located at the very end of file:

enter image description here

(Picture from Wikipedia)

And luckily, zipfile indeed doesn't do any seeks as long as you only add files.

Here is the code I came up with. Some notes:

  • I'm using this code for zipping up a bunch of JPEG pictures. There is no point compressing them, I'm using ZIP only as container.
  • Memory usage is O(size_of_largest_file) not O(size_of_archive). And this is good enough for me: many relatively small files that add up to potentially huge archive
  • This code doesn't set Content-Length header, so user doesn't get nice progress indication. It should be possible to calculate this in advance if sizes of all files are known.
  • Serving the ZIP straight to user like this means that resume on downloads won't work.

So, here goes:

import zipfile

class ZipBuffer(object):
    """ A file-like object for zipfile.ZipFile to write into. """

    def __init__(self):
        self.data = []
        self.pos = 0

    def write(self, data):
        self.data.append(data)
        self.pos += len(data)

    def tell(self):
        # zipfile calls this so we need it
        return self.pos

    def flush(self):
        # zipfile calls this so we need it
        pass

    def get_and_clear(self):
        result = self.data
        self.data = []
        return result

def generate_zipped_stream():
    sink = ZipBuffer()
    archive = zipfile.ZipFile(sink, "w")
    for filename in ["file1.txt", "file2.txt"]:
        archive.writestr(filename, "contents of file here")
        for chunk in sink.get_and_clear():
            yield chunk

    archive.close()
    # close() generates some more data, so we yield that too
    for chunk in sink.get_and_clear():
        yield chunk

def my_django_view(request):
    response = HttpResponse(generate_zipped_stream(), mimetype="application/zip")
    response['Content-Disposition'] = 'attachment; filename=archive.zip'
    return response
超可爱的懒熊 2024-07-31 01:02:32

这是一个简单的 Django 视图函数,它将(作为示例)压缩 /tmp 中的任何可读文件并返回 zip 文件。

from django.http import HttpResponse
import zipfile
import os
from cStringIO import StringIO # caveats for Python 3.0 apply

def somezip(request):
    file = StringIO()
    zf = zipfile.ZipFile(file, mode='w', compression=zipfile.ZIP_DEFLATED)
    for fn in os.listdir("/tmp"):
        path = os.path.join("/tmp", fn)
        if os.path.isfile(path):
            try:
                zf.write(path)
            except IOError:
                pass
    zf.close()
    response = HttpResponse(file.getvalue(), mimetype="application/zip")
    response['Content-Disposition'] = 'attachment; filename=yourfiles.zip'
    return response

当然,只有当 zip 文件能够方便地装入内存时,这种方法才有效 - 如果不能,您将不得不使用磁盘文件(您试图避免这种情况)。 在这种情况下,您只需将 file = StringIO() 替换为 file = open('/path/to/yourfiles.zip', 'wb') 并替换file.getvalue() 包含读取磁盘文件内容的代码。

Here's a simple Django view function which zips up (as an example) any readable files in /tmp and returns the zip file.

from django.http import HttpResponse
import zipfile
import os
from cStringIO import StringIO # caveats for Python 3.0 apply

def somezip(request):
    file = StringIO()
    zf = zipfile.ZipFile(file, mode='w', compression=zipfile.ZIP_DEFLATED)
    for fn in os.listdir("/tmp"):
        path = os.path.join("/tmp", fn)
        if os.path.isfile(path):
            try:
                zf.write(path)
            except IOError:
                pass
    zf.close()
    response = HttpResponse(file.getvalue(), mimetype="application/zip")
    response['Content-Disposition'] = 'attachment; filename=yourfiles.zip'
    return response

Of course this approach will only work if the zip files will conveniently fit into memory - if not, you'll have to use a disk file (which you're trying to avoid). In that case, you just replace the file = StringIO() with file = open('/path/to/yourfiles.zip', 'wb') and replace the file.getvalue() with code to read the contents of the disk file.

枕头说它不想醒 2024-07-31 01:02:32

您使用的 zip 库是否允许输出到流。 您可以直接流式传输给用户,而不是暂时写入 zip 文件然后流式传输给用户。

Does the zip library you are using allow for output to a stream. You could stream directly to the user instead of temporarily writing to a zip file THEN streaming to the user.

贪恋 2024-07-31 01:02:32

可以将迭代器传递给 HttpResponse 的构造函数 (参见文档)。 这将允许您创建一个自定义迭代器,根据请求生成数据。 不过,我认为这不适用于 zip(您必须在创建时发送部分 zip)。

我认为正确的方法是在单独的过程中离线创建文件。 然后,用户可以监视进度,然后在文件准备好时下载文件(可能通过使用上述迭代器方法)。 这与 YouTube 等网站在上传文件并等待其处理时使用的方式类似。

It is possible to pass an iterator to the constructor of a HttpResponse (see docs). That would allow you to create a custom iterator that generates data as it is being requested. However I don't think that will work with a zip (you would have to send partial zip as it is being created).

The proper way, I think, would be to create the files offline, in a separate process. The user could then monitor the progress and then download the file when its ready (possibly by using the iterator method described above). This would be similar what sites like youtube use when you upload a file and wait for it to be processed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文