使用 Twisted 通过 FTP 下载文件时如何关闭文件对象?

发布于 2024-09-13 12:06:15 字数 695 浏览 4 评论 0原文

我有以下代码:

for f in fileListProtocol.files:
    if f['filetype'] == '-':
        filename = os.path.join(directory['filename'], f['filename'])
        print 'Downloading %s...' % (filename)
        newFile = open(filename, 'w+')
        d = ftpClient.retrieveFile(filename, FileConsumer(newFile))
        d.addCallback(closeFile, newFile)

不幸的是,在下载相关目录中的 1000 多个文件中的数百个之后,我收到有关打开文件过多的 IOError 错误。为什么我应该在下载每个文件后关闭它们?如果有一种更惯用的方法来完成下载大量文件的整个任务,我很想听听。谢谢。

更新: Jean-Paul 的 DeferredSemaphore 示例加上 Matt 的 FTPFile 就成功了。由于某种原因,使用 Cooperator 而不是 DeferredSemaphore 会下载一些文件,然后会失败,因为 FTP 连接会中断。

I've got the following code:

for f in fileListProtocol.files:
    if f['filetype'] == '-':
        filename = os.path.join(directory['filename'], f['filename'])
        print 'Downloading %s...' % (filename)
        newFile = open(filename, 'w+')
        d = ftpClient.retrieveFile(filename, FileConsumer(newFile))
        d.addCallback(closeFile, newFile)

Unfortunately, after downloading several hundred of the 1000+ files in the directory in question I get an IOError about too many open files. Why is this when I should be closing each file after they've been downloaded? If there's a more idiomatic way to approach the whole task of downloading lots of files too, I'd love to hear it. Thanks.

Update: Jean-Paul's DeferredSemaphore example plus Matt's FTPFile did the trick. For some reason using a Cooperator instead of DeferredSemaphore would download a few files and then fail because the FTP connection would have died.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

送舟行 2024-09-20 12:06:15

假设您正在使用来自 twisted.protocols.ftpFTPClient...并且在反驳 JP 之前我当然会犹豫..

看来 FileConsumer 您传递给 retrieveFile 的类将通过 twisted.internet.protocol.ConsumerToProtocolAdapter 适应 IProtocol,它不会调用 unregisterProducer,因此 FileConsumer 不会关闭文件对象。

我已经制定了一个快速协议,您可以使用它来接收文件。我认为它应该只在适当的时候打开文件。完全未经测试,您可以在上面的代码中使用它代替 FileConsumer,并且不需要 addCallback

from twisted.python import log
from twisted.internet import interfaces
from zope.interface import implements

class FTPFile(object):
    """
    A consumer for FTP input that writes data to a file.

    @ivar filename: a filename to be opened for writing.
    """

    implements(interfaces.IProtocol)

    def __init__(self, filename):
        self.fObj = None
        self.filename = filename

    def makeConnection(self,transport)
        self.fObj = open(self.filename,'wb')
        log.info('Opened %s for writing' % self.filename)

    def connectionLost(self,reason):
        self.fObj.close()
        log.info('Closed %s' % self.filename)

    def dataReceived(self, bytes):
        self.fObj.write(bytes)

Assuming that you're using FTPClient from twisted.protocols.ftp... and I certainly hesitate before contradicting JP..

It seems that the FileConsumer class you're passing to retrieveFile will be adapted to IProtocol by twisted.internet.protocol.ConsumerToProtocolAdapter, which doesn't call unregisterProducer, so FileConsumer doesn't close the file object.

I've knocked up a quick protocol that you can use to receive the files. I think it should only open the file when appropriate. Totally untested, you'd use it in place of FileConsumer in your code above and won't need the addCallback.

from twisted.python import log
from twisted.internet import interfaces
from zope.interface import implements

class FTPFile(object):
    """
    A consumer for FTP input that writes data to a file.

    @ivar filename: a filename to be opened for writing.
    """

    implements(interfaces.IProtocol)

    def __init__(self, filename):
        self.fObj = None
        self.filename = filename

    def makeConnection(self,transport)
        self.fObj = open(self.filename,'wb')
        log.info('Opened %s for writing' % self.filename)

    def connectionLost(self,reason):
        self.fObj.close()
        log.info('Closed %s' % self.filename)

    def dataReceived(self, bytes):
        self.fObj.write(bytes)
花辞树 2024-09-20 12:06:15

您同时打开 fileListProtocol.files 中的每个文件,将内容下载到其中,然后在每次下载完成后关闭每个文件。因此,您在进程开始时打开了 len(fileListProtocol.files) 文件。如果该列表中有太多文件,那么您将尝试打开太多文件。

您可能希望将自己限制为一次相当少量的并行下载(如果 FTP 甚至支持并行下载,我并不完全确定是这种情况)。

http://jcalderone.livejournal.com/24285.html对 Python Twisted 透视代理进行远程调用排队? 可能会有所帮助弄清楚如何限制并行启动的下载数量。

You're opening every file in fileListProtocol.files simultaneously, downloading contents to them, and then closing each when each download is complete. So, you have len(fileListProtocol.files) files open at the beginning of the process. If there are too many files in that list, then you'll try to open too many files.

You probably want to limit yourself to some fairly small number of parallel downloads at once (if FTP even supports parallel downloads, which I'm not entirely certain is the case).

http://jcalderone.livejournal.com/24285.html and Queue remote calls to a Python Twisted perspective broker? may be of some help in figuring out how to limit the number of downloads you start in parallel.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文