使用 Twisted 通过 FTP 下载文件时如何关闭文件对象?
我有以下代码:
for f in fileListProtocol.files:
if f['filetype'] == '-':
filename = os.path.join(directory['filename'], f['filename'])
print 'Downloading %s...' % (filename)
newFile = open(filename, 'w+')
d = ftpClient.retrieveFile(filename, FileConsumer(newFile))
d.addCallback(closeFile, newFile)
不幸的是,在下载相关目录中的 1000 多个文件中的数百个之后,我收到有关打开文件过多的 IOError 错误。为什么我应该在下载每个文件后关闭它们?如果有一种更惯用的方法来完成下载大量文件的整个任务,我很想听听。谢谢。
更新: Jean-Paul 的 DeferredSemaphore
示例加上 Matt 的 FTPFile
就成功了。由于某种原因,使用 Cooperator
而不是 DeferredSemaphore
会下载一些文件,然后会失败,因为 FTP 连接会中断。
I've got the following code:
for f in fileListProtocol.files:
if f['filetype'] == '-':
filename = os.path.join(directory['filename'], f['filename'])
print 'Downloading %s...' % (filename)
newFile = open(filename, 'w+')
d = ftpClient.retrieveFile(filename, FileConsumer(newFile))
d.addCallback(closeFile, newFile)
Unfortunately, after downloading several hundred of the 1000+ files in the directory in question I get an IOError about too many open files. Why is this when I should be closing each file after they've been downloaded? If there's a more idiomatic way to approach the whole task of downloading lots of files too, I'd love to hear it. Thanks.
Update: Jean-Paul's DeferredSemaphore
example plus Matt's FTPFile
did the trick. For some reason using a Cooperator
instead of DeferredSemaphore
would download a few files and then fail because the FTP connection would have died.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
假设您正在使用来自
twisted.protocols.ftp
的FTPClient
...并且在反驳 JP 之前我当然会犹豫..看来
FileConsumer
您传递给retrieveFile
的类将通过twisted.internet.protocol.ConsumerToProtocolAdapter
适应IProtocol
,它不会调用unregisterProducer
,因此FileConsumer
不会关闭文件对象。我已经制定了一个快速协议,您可以使用它来接收文件。我认为它应该只在适当的时候打开文件。完全未经测试,您可以在上面的代码中使用它代替
FileConsumer
,并且不需要addCallback
。Assuming that you're using
FTPClient
fromtwisted.protocols.ftp
... and I certainly hesitate before contradicting JP..It seems that the
FileConsumer
class you're passing toretrieveFile
will be adapted toIProtocol
bytwisted.internet.protocol.ConsumerToProtocolAdapter
, which doesn't callunregisterProducer
, soFileConsumer
doesn't close the file object.I've knocked up a quick protocol that you can use to receive the files. I think it should only open the file when appropriate. Totally untested, you'd use it in place of
FileConsumer
in your code above and won't need theaddCallback
.您同时打开
fileListProtocol.files
中的每个文件,将内容下载到其中,然后在每次下载完成后关闭每个文件。因此,您在进程开始时打开了 len(fileListProtocol.files) 文件。如果该列表中有太多文件,那么您将尝试打开太多文件。您可能希望将自己限制为一次相当少量的并行下载(如果 FTP 甚至支持并行下载,我并不完全确定是这种情况)。
http://jcalderone.livejournal.com/24285.html 和 对 Python Twisted 透视代理进行远程调用排队? 可能会有所帮助弄清楚如何限制并行启动的下载数量。
You're opening every file in
fileListProtocol.files
simultaneously, downloading contents to them, and then closing each when each download is complete. So, you havelen(fileListProtocol.files)
files open at the beginning of the process. If there are too many files in that list, then you'll try to open too many files.You probably want to limit yourself to some fairly small number of parallel downloads at once (if FTP even supports parallel downloads, which I'm not entirely certain is the case).
http://jcalderone.livejournal.com/24285.html and Queue remote calls to a Python Twisted perspective broker? may be of some help in figuring out how to limit the number of downloads you start in parallel.