如何使用 Perl 即时写入压缩文件?

发布于 2024-09-24 20:43:38 字数 964 浏览 2 评论 0原文

我正在使用 Perl 生成相对较大的文件。我生成的文件有两种:

  1. 表格文件,即我逐行(逐行)打印的文本文件,其中主要包含数字。典型的行如下所示:

    126891 126991 14545 12

  2. 我创建的序列化对象,然后使用 Storable::nstore 存储到文件中。这些对象通常包含一些带有数值的大型散列。对象中的值可能已被打包以节省空间(并且对象在使用前解压每个值)。

目前我通常会执行以下操作:

use IO::Compress::Gzip qw(gzip $GzipError);

# create normal, uncompressed file ($out_file)
# ...

# compress file using gzip
my $gz_out_file = "$out_file.gz";
gzip $out_file => $gz_out_file or die "gzip failed: $GzipError";

# delete uncompressed file
unlink($out_file) or die "can't unlink file $out_file: $!";

这是非常低效的,因为我首先将大文件写入磁盘,然后再次读取它并压缩它。所以我的问题如下:

  1. 我可以在不先将文件写入磁盘的情况下创建压缩文件吗?是否可以按顺序创建压缩文件,即像前面描述的场景 (1) 那样逐行打印?

  2. Gzip 听起来是一个合适的选择吗? a对于我所描述的数据类型,还有其他推荐的压缩器吗?

  3. 将值打包到稍后存储和压缩的对象中有意义吗?

我的考虑主要是节省磁盘空间并允许稍后快速解压。

I am generating relatively large files using Perl. The files I am generating are of two kinds:

  1. Table files, i.e. textual files I print line by line (row by row), which contain mainly numbers. A typical line looks like:

    126891 126991 14545 12

  2. Serialized objects I create then store into a file using Storable::nstore. These objects usually contain some large hash with numeric values. The values in the object might have been packed to save on space (and the object unpacks each value before using it).

Currently I'm usually doing the following:

use IO::Compress::Gzip qw(gzip $GzipError);

# create normal, uncompressed file ($out_file)
# ...

# compress file using gzip
my $gz_out_file = "$out_file.gz";
gzip $out_file => $gz_out_file or die "gzip failed: $GzipError";

# delete uncompressed file
unlink($out_file) or die "can't unlink file $out_file: $!";

This is quite inefficient since I first write the large file to disk, then gzip read it again and compresses it. So my questions are as following:

  1. Can I create a compressed file without first writing a file to disk? Is it possible to create a compressed file sequentially, i.e. printing line-by-line like in scenario (1) described earlier?

  2. Does Gzip sounds like an appropriate choice? aRe there any other recommended compressors for the kind of data I have described?

  3. Does it make sense to pack values in an object that will later be stored and compressed anyway?

My considerations are mainly saving on disk space and allowing fast decompression later on.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

宛菡 2024-10-01 20:43:38
  1. 您可以使用 IO::ZlibPerlIO::gzip 绑定文件句柄以进行动态压缩。

  2. 至于哪种压缩器合适,只需尝试几种压缩器,看看它们对您的数据的处理效果如何。还要留意它们用于压缩和解压缩的 CPU/内存量。

  3. 再次测试一下 pack 对您的数据有多大帮助,以及如何帮助它会极大地影响你的表现。在某些情况下,这可能会有所帮助。在其他情况下,可能不会。这实际上取决于您的数据。

  1. You can use IO::Zlib or PerlIO::gzip to tie a file handle to compress on the fly.

  2. As for what compressors are appropriate, just try several and see how they do on your data. Also keep an eye on how much CPU/memory they use for compression and decompression.

  3. Again, test to see how much pack helps with your data, and how much it affects your performance. In some cases, it may be helpful. In others, it may not. It really depends on your data.

椵侞 2024-10-01 20:43:38

您还可以 open() 标量而不是真实文件的文件句柄,并将此文件句柄与 IO::Compress::Gzip 一起使用。还没有实际尝试过,但应该有效。我使用与 Net::FTP 类似的东西来避免在磁盘上创建文件。

从 v5.8.0 开始,Perl 默认使用 PerlIO 构建。除非您更改了此设置(即,Configure -Uuseperlio),否则您可以通过以下方式直接打开 Perl 标量的文件句柄:

open($fh, '>', \$variable) || ..

来自 open()

You can also open() a filehandle to a scalar instead of a real file, and use this filehandle with IO::Compress::Gzip. Haven't actually tried it, but it should work. I use something similar with Net::FTP to avoid creating files on disk.

Since v5.8.0, Perl has built using PerlIO by default. Unless you've changed this (i.e., Configure -Uuseperlio), you can open filehandles directly to Perl scalars via:

open($fh, '>', \$variable) || ..

from open()

我不咬妳我踢妳 2024-10-01 20:43:38

IO::Compress::Zlib 有一个可用于此目的的 OO 接口。

use strict;  
use warnings;
use IO::Compress::Gzip;

my $z = IO::Compress::Gzip->new('out.gz');
$z->print($_, "\n") for 0 .. 10;

IO::Compress::Zlib has an OO interface that can be used for this.

use strict;  
use warnings;
use IO::Compress::Gzip;

my $z = IO::Compress::Gzip->new('out.gz');
$z->print($_, "\n") for 0 .. 10;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文