如何在 Perl 中打开文件数组？

发布于 2024-08-06 10:42:42 字数 423 浏览 0 评论 0原文

在 perl 中，我从目录中读取文件，并且我想同时打开它们（但逐行），以便我可以执行一个将所有第 n 行一起使用的函数（例如串联）。

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);
my @files;
for my $i (0..$#temps) {
  my $file;
  open($file,"<",$temps[$i]);
  push(@files,$file);
}
my $concat;
for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat.=$blah;
}
print $concat;

我只是一堆错误、使用未初始化值和 GLOB(..) 错误。那么我怎样才能做到这一点呢？

原文

In perl, I read in files from a directory, and I want to open them all simultaneously (but line by line) so that I can perform a function that uses all of their nth lines together (e.g. concatenation).

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);
my @files;
for my $i (0..$#temps) {
  my $file;
  open($file,"<",$temps[$i]);
  push(@files,$file);
}
my $concat;
for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat.=$blah;
}
print $concat;

I just a bunch of errors, use of uninitialized value, and GLOB(..) errors. So how can I make this work?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

左耳近心 2024-08-13 10:42:42

很多问题。从调用“ls | grep”开始:)

让我们从一些代码开始：

首先，让我们获取文件列表：

my @files = glob( '*.txt' );

但最好测试给定的名称是否与文件或目录相关：

my @files = grep { -f } glob( '*.txt' );

现在，让我们打开这些文件进行读取他们：

my @fhs = map { open my $fh, '<', $_; $fh } @files;

但是，我们需要一种方法来处理错误 - 在我看来，最好的方法是添加：

use autodie;

在脚本的开头（以及安装 autodie，如果您还没有）。或者，您可以：

use Fatal qw( open );

现在，我们有了它，让我们从所有输入中获取第一行（如您在示例中所示），并将其连接起来：

my $concatenated = '';

for my $fh ( @fhs ) {
    my $line = <$fh>;
    $concatenated .= $line;
}

这非常好，并且可读，但仍然可以缩短，同时保持（在我看来）可读性：

my $concatenated = join '', map { scalar <$_> } @fhs;

效果是相同的 - $concatenate 包含所有文件的第一行。

因此，整个程序将如下所示：

#!/usr/bin/perl
use strict;
use warnings;
use autodie;
# use Fatal qw( open ); # uncomment if you don't have autodie

my @files        = grep { -f } glob( '*.txt' );
my @fhs          = map { open my $fh, '<', $_; $fh } @files;
my $concatenated = join '', map { scalar <$_> } @fhs;

现在，您可能不仅想连接第一行，还想连接所有行。在这种情况下，您需要这样的代码，而不是 $concatenated = ... 代码：

my $concatenated = '';

while (my $fh = shift @fhs) {
    my $line = <$fh>;
    if ( defined $line ) {
        push @fhs, $fh;
        $concatenated .= $line;
    } else {
        close $fh;
    }
}

A lot of issues. Starting with call to "ls | grep" :)

Let's start with some code:

First, let's get list of files:

my @files = glob( '*.txt' );

But it would be better to test if the given name relates to file or directory:

my @files = grep { -f } glob( '*.txt' );

Now, let's open these files to read them:

my @fhs = map { open my $fh, '<', $_; $fh } @files;

But, we need a way to handle errors - in my opinion the best way is to add:

use autodie;

At the beginning of script (and installation of autodie, if you don't have it yet). Alternatively you can:

use Fatal qw( open );

Now, that we have it, let's get the first line (as you showed in your example) from all of the inputs, and concatenate it:

my $concatenated = '';

for my $fh ( @fhs ) {
    my $line = <$fh>;
    $concatenated .= $line;
}

Which is perfectly fine, and readable, but still can be shortened, while maintaining (in my opinion) readability, to:

my $concatenated = join '', map { scalar <$_> } @fhs;

Effect is the same - $concatenated contains first lines from all files.

So, whole program would look like this:

#!/usr/bin/perl
use strict;
use warnings;
use autodie;
# use Fatal qw( open ); # uncomment if you don't have autodie

my @files        = grep { -f } glob( '*.txt' );
my @fhs          = map { open my $fh, '<', $_; $fh } @files;
my $concatenated = join '', map { scalar <$_> } @fhs;

Now, it might be that you want to concatenate not just first lines, but all of them. In this situation, instead of $concatenated = ... code, you'd need something like this:

my $concatenated = '';

while (my $fh = shift @fhs) {
    my $line = <$fh>;
    if ( defined $line ) {
        push @fhs, $fh;
        $concatenated .= $line;
    } else {
        close $fh;
    }
}

回复收藏 0 原文

少女七分熟 2024-08-13 10:42:42

这是您的问题：

for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat .= $blah;
}

首先， <$files[$i]> 不是有效的文件句柄读取。这是 GLOB(...) 错误的根源。请参阅 mobrule 的回答了解为什么会出现这种情况。因此，将其更改为：

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah;
}

第二个问题，您正在混合 @blah （名为 blah 的数组）和 $blah （名为 <代码>废话）。这是“未初始化值”错误的根源 - $blah（标量）尚未初始化，但您正在使用它。如果您想要来自 @blah 的第 $n 行，请使用以下内容：

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah[$n];
}

我不想继续打败一匹死马，但我确实想解决一个更好的问题做某事的方法：

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);

这会读取当前目录中具有“.txt”扩展名的所有文件的列表。这可行且有效，但可能相当慢 - 我们必须调用 shell，它必须分叉才能运行 ls 和 grep，而且产生一点开销。此外，ls 和 grep 是简单且通用的程序，但不完全可移植。当然有更好的方法来做到这一点：

my @temps;
opendir(DIRHANDLE, ".");
while(my $file = readdir(DIRHANDLE)) {
  push @temps, $file if $file =~ /\.txt/;
}

简单、简短、纯 Perl、无分叉、无不可移植的 shell，而且我们不必读取字符串并然后分割它 - 我们可以只存储我们真正需要的条目。另外，修改通过测试的文件的条件也变得微不足道。假设我们最终不小心读取了文件 test.txt.gz 因为我们的正则表达式匹配：我们可以轻松地将这一行更改为：

  push @temps, $file if $file =~ /\.txt$/;

我们可以使用 grep 来做到这一点（我相信），但是当 Perl 内置了最强大的正则表达式库之一时，为什么要满足于 grep 有限的正则表达式呢？

Here is your problem:

for my $i (0..$#files) {
  my @blah = <$files[$i]>;
  $concat .= $blah;
}

First, <$files[$i]> isn't a valid filehandle read. This is the source of your GLOB(...) errors. See mobrule's answer for why this is the case. So change it to this:

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah;
}

Second problem, You're mixing @blah (an array named blah) and $blah (a scalar named blah). This is the source of your "uninitialized value" errors - $blah (the scalar) hasn't been initialized, but you're using it. If you want the $n-th line from @blah, use this:

for my $file (@files) {
  my @blah = <$file>;
  $concat .= $blah[$n];
}

I don't want to keep beating a dead horse, but I do want to address a better way to do something:

my $text = `ls | grep ".txt"`;
my @temps = split(/\n/,$text);

This reads in a list of all files in the current directory that have a ".txt" extension in them. This works, and is effective, but it can be rather slow - we have to call out to the shell, which has to fork off to run ls and grep, and that incurs a bit of overhead. Furthermore, ls and grep are simple and common programs, but not exactly portable. Surely there's a better way to do this:

my @temps;
opendir(DIRHANDLE, ".");
while(my $file = readdir(DIRHANDLE)) {
  push @temps, $file if $file =~ /\.txt/;
}

Simple, short, pure Perl, no forking, no non-portable shells, and we don't have to read in the string and then split it - we can only store the entries we really need. Plus, it becomes trivial to modify the conditions for files that pass the test. Say we end up accidentally reading the file test.txt.gz because our regex matches: we can easily change that line to:

  push @temps, $file if $file =~ /\.txt$/;

We can do that one with grep (I believe), but why settle for grep's limited regular expressions when Perl has one of the most powerful regex libraries anywhere built-in?

回复收藏 0 原文

听风念你 2024-08-13 10:42:42

在 <> 运算符内使用大括号将 $files[$i] 括起来，

my @blah = <{$files[$i]}>

否则 Perl 将 <> 解释为文件 glob 运算符从文件句柄读取操作符。

Use braces around $files[$i] inside the <> operator

my @blah = <{$files[$i]}>

Otherwise Perl interprets <> as the file glob operator instead of the read-from-filehandle operator.

回复收藏 0 原文

原来是傀儡 2024-08-13 10:42:42

您已经得到了一些很好的答案。解决该问题的另一种方法是创建一个包含文件中所有行的列表列表 (@content)。然后使用 List::MoreUtils 中的 each_arrayref 函数，这将创建一个迭代器，从所有文件中生成第 1 行，然后生成第 2 行，依此类推。

use strict;
use warnings;
use List::MoreUtils qw(each_arrayref);

my @content =
    map {
        open(my $fh, '<', $_) or die $!;
        [<$fh>]
    }
    grep {-f}
    glob '*.txt'
;
my $iterator = each_arrayref @content;
while (my @nth_lines = $iterator->()){
    # Do stuff with @nth_lines;
}

You've got some good answers already. Another way to tackle the problem is to create a list-of-lists containing all of the lines from the files (@content). Then use the each_arrayref function from List::MoreUtils, which will create an iterator that yields line 1 from all files, then line 2, etc.

use strict;
use warnings;
use List::MoreUtils qw(each_arrayref);

my @content =
    map {
        open(my $fh, '<', $_) or die $!;
        [<$fh>]
    }
    grep {-f}
    glob '*.txt'
;
my $iterator = each_arrayref @content;
while (my @nth_lines = $iterator->()){
    # Do stuff with @nth_lines;
}

回复收藏 0 原文

~没有更多了~