如何在Python中不使用临时文件的情况下将大量数据写入tarfile中
我用 python 编写了一个小型加密模块,其任务是对文件进行加密并将结果放入 tarfile 中。要加密的原始文件可以很大,但这不是问题,因为我的程序一次只需要处理一小块数据,可以即时加密并存储。
我正在寻找一种方法来避免分两次完成,首先将所有数据写入临时文件,然后将结果插入 tar 文件。
基本上我执行以下操作(其中generator_encryptor是一个简单的生成器,它生成从源文件读取的数据块)。 :
t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
for chunk in generator_encryptor("sourcefile"):
tmp.write(chunks)
tmp.close()
t.add(content)
t.close()
我有点恼火必须使用临时文件,因为我归档它应该很容易直接在 tar 文件中写入块,但是将每个块收集在单个字符串中并使用类似 t.addfile('content', StringIO (bigcipheredstring) 似乎被排除在外,因为我不能保证我有足够的内存来存储旧的 bigcipheredstring。
有任何关于如何做到这一点的提示吗?
I've wrote a small cryptographic module in python whose task is to cipher a file and put the result in a tarfile. The original file to encrypt can be quit large, but that's not a problem because my program only need to work with a small block of data at a time, that can be encrypted on the fly and stored.
I'm looking for a way to avoid doing it in two passes, first writing all the data in a temporary file then inserting result in a tarfile.
Basically I do the following (where generator_encryptor is a simple generator that yield chunks of data read from sourcefile).
:
t = tarfile.open("target.tar", "w")
tmp = file('content', 'wb')
for chunk in generator_encryptor("sourcefile"):
tmp.write(chunks)
tmp.close()
t.add(content)
t.close()
I'm a bit annoyed having to use a temporary file as I file it should be easy to write blocs directly in the tar file, but collecting every chunks in a single string and using something like t.addfile('content', StringIO(bigcipheredstring) seems excluded because I can't guarantee that I have memory enough to old bigcipheredstring.
Any hint of how to do that ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以创建自己的类文件对象并传递给
TarFile.addfile
。您的类文件对象将在fileobj.read()
方法中动态生成加密内容。You can create an own file-like object and pass to
TarFile.addfile
. Your file-like object will generate the encrypted contents on the fly in thefileobj.read()
method.啊?你不能只使用 subprocess 模块来运行管道到 tar 吗?这样,就不需要临时文件了。当然,如果您无法生成足够小的数据块以适应 RAM,那么这将不起作用,但如果您遇到这个问题,那么 tar 就不是问题了。
Huh? Can't you just use the subprocess module to run a pipe through to tar? That way, no temporary file should be needed. Of course, this won't work if you can't generate your data in small enough chunks to fit in RAM, but if you have that problem, then tar isn't the issue.
基本上使用类似文件的对象并将其传递给 TarFile.addfile 就可以解决问题,但仍然存在一些问题。
结果代码如下,基本上我必须编写一个包装类,将现有的生成器转换为类似文件的对象。我还在示例中添加了 GeneratorEncrypto 类以使代码完整。您可以注意到它有一个 len 方法,该方法返回写入文件的长度(但请理解它只是一个没有任何用处的虚拟占位符)。
Basically using a file-like object and passing it to TarFile.addfile do the trick, but there is still some issues open.
The resulting code is below, basically I had to write a wrapper class that transform my existing generator into a file-like object. I also added the GeneratorEncrypto class in my example to make code compleat. You can notice it has a len method that returns the length of the written file (but understand it's just a dummy placeholder that does nothing usefull).
我想您需要了解 tar 格式的工作原理,并自己处理 tar 的编写。也许这会有帮助?
http://mail.python.org/pipermail/python-列表/2001-8 月/100796.html
I guess you need to understand how the tar format works, and handle the tar writing yourself. Maybe this can be helpful?
http://mail.python.org/pipermail/python-list/2001-August/100796.html