读取当前正在写入的 gzip 文件
我的程序进行大量文件处理,由于文件很大,我更喜欢将它们写为 GZIP。一项挑战是我经常需要在写入文件时读取它们。如果没有 GZIP 压缩,这不是问题,但是当压缩打开时,读取会抱怨失败的 CRC,我认为这可能与写入时未正确刷新压缩信息有关。有没有什么方法可以将 GZIP 与 Python 结合使用,这样当我写入并刷新到文件(但不一定关闭文件)时,也可以读取它?
My program does a lot of file processing, and as the files are large I prefer to write them as GZIP. One challenge is that I often need to read files as they are being written. This is not a problem without GZIP compression, but when compression is on, the reading complains about failed CRC, which I presume might have something to do with compression info not being flushed properly when writing. Is there any way to use GZIP with Python such that, when I write and flush to a file (but not necessarily close the file), that it can be read as well?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为将数据刷新到文件(压缩)只是将数据写入文件,但标头仅在
close()
上写入,因此您需要先关闭文件,并且只有在可以打开它并读取您需要的所有数据。如果您需要写入大量数据,您可以尝试使用数据库,例如PostgreSQL或MySQL,您可以在其中指定压缩表(存档,压缩),然后您将能够将数据插入表中并读取它,数据库软件将为您完成所有其余工作(插入时的压缩、解压、选择)。I think flushing data to file (compressed) just writes the data into file, but headers are written only on
close()
, so you need to close the file first, and only after you can open it and read all data you need. If you need to write large data ammounts, you can try to use database, like PostgreSQL or MySQL where you can specify table with compression (archive, compressed), and you will be able to insert data into the table, and read it, database software will do all rest for you (compression, decompression on inserts, selects).