如何使用 Perl 即时写入压缩文件?
我正在使用 Perl 生成相对较大的文件。我生成的文件有两种:
表格文件,即我逐行(逐行)打印的文本文件,其中主要包含数字。典型的行如下所示:
126891 126991 14545 12
我创建的序列化对象,然后使用
Storable::nstore
存储到文件中。这些对象通常包含一些带有数值的大型散列。对象中的值可能已被打包
以节省空间(并且对象在使用前解压
每个值)。
目前我通常会执行以下操作:
use IO::Compress::Gzip qw(gzip $GzipError);
# create normal, uncompressed file ($out_file)
# ...
# compress file using gzip
my $gz_out_file = "$out_file.gz";
gzip $out_file => $gz_out_file or die "gzip failed: $GzipError";
# delete uncompressed file
unlink($out_file) or die "can't unlink file $out_file: $!";
这是非常低效的,因为我首先将大文件写入磁盘,然后再次读取它并压缩它。所以我的问题如下:
我可以在不先将文件写入磁盘的情况下创建压缩文件吗?是否可以按顺序创建压缩文件,即像前面描述的场景 (1) 那样逐行打印?
Gzip
听起来是一个合适的选择吗? a对于我所描述的数据类型,还有其他推荐的压缩器吗?将值
打包
到稍后存储和压缩的对象中有意义吗?
我的考虑主要是节省磁盘空间并允许稍后快速解压。
I am generating relatively large files using Perl. The files I am generating are of two kinds:
Table files, i.e. textual files I print line by line (row by row), which contain mainly numbers. A typical line looks like:
126891 126991 14545 12
Serialized objects I create then store into a file using
Storable::nstore
. These objects usually contain some large hash with numeric values. The values in the object might have beenpack
ed to save on space (and the objectunpack
s each value before using it).
Currently I'm usually doing the following:
use IO::Compress::Gzip qw(gzip $GzipError);
# create normal, uncompressed file ($out_file)
# ...
# compress file using gzip
my $gz_out_file = "$out_file.gz";
gzip $out_file => $gz_out_file or die "gzip failed: $GzipError";
# delete uncompressed file
unlink($out_file) or die "can't unlink file $out_file: $!";
This is quite inefficient since I first write the large file to disk, then gzip
read it again and compresses it. So my questions are as following:
Can I create a compressed file without first writing a file to disk? Is it possible to create a compressed file sequentially, i.e. printing line-by-line like in scenario (1) described earlier?
Does
Gzip
sounds like an appropriate choice? aRe there any other recommended compressors for the kind of data I have described?Does it make sense to
pack
values in an object that will later be stored and compressed anyway?
My considerations are mainly saving on disk space and allowing fast decompression later on.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用
IO::Zlib
或PerlIO::gzip
绑定文件句柄以进行动态压缩。至于哪种压缩器合适,只需尝试几种压缩器,看看它们对您的数据的处理效果如何。还要留意它们用于压缩和解压缩的 CPU/内存量。
再次测试一下
pack
对您的数据有多大帮助,以及如何帮助它会极大地影响你的表现。在某些情况下,这可能会有所帮助。在其他情况下,可能不会。这实际上取决于您的数据。You can use
IO::Zlib
orPerlIO::gzip
to tie a file handle to compress on the fly.As for what compressors are appropriate, just try several and see how they do on your data. Also keep an eye on how much CPU/memory they use for compression and decompression.
Again, test to see how much
pack
helps with your data, and how much it affects your performance. In some cases, it may be helpful. In others, it may not. It really depends on your data.您还可以 open() 标量而不是真实文件的文件句柄,并将此文件句柄与 IO::Compress::Gzip 一起使用。还没有实际尝试过,但应该有效。我使用与 Net::FTP 类似的东西来避免在磁盘上创建文件。
来自 open()
You can also open() a filehandle to a scalar instead of a real file, and use this filehandle with IO::Compress::Gzip. Haven't actually tried it, but it should work. I use something similar with Net::FTP to avoid creating files on disk.
from open()
IO::Compress::Zlib 有一个可用于此目的的 OO 接口。
IO::Compress::Zlib has an OO interface that can be used for this.