使用 Perl 循环遍历文件中的行的最具防御性的方法是什么？

发布于 2024-09-24 17:08:28 字数 1422 浏览 9 评论 0原文

我通常使用以下代码循环遍历文件中的行：

open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
  ...
}

但是，在回答另一个问题时，埃文·卡罗尔编辑了我的答案，将我的 while 语句更改为：

while ( defined( my $line = <$fh> ) ) {
  ...
}

他的理由是，如果你有一行 0 （它必须是最后一行，否则它会回车符）那么如果您使用我的语句（$line 将设置为 "0"，并且 return因此，赋值的值也将为 "0"，其计算结果为 false）。如果你检查定义性，那么你就不会遇到这个问题。这是完全有道理的。

所以我尝试了一下。我创建了一个文本文件，其最后一行是 0 ，其中没有回车符。我在循环中运行了它，并且循环没有过早退出。

然后我想，“啊哈，也许这个值实际上并不是 0，也许还有其他东西把事情搞砸了！”所以我使用了 Devel::Peek 中的 Dump() ，这就是它给我的：

SV = PV(0x635088) at 0x92f0e8
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  PV = 0X962600 "0"\0
  CUR = 1
  LEN = 80

这似乎告诉我该值实际上是字符串 "0 "，因为如果我在已显式设置为 "0" 的标量上调用 Dump()，我会得到类似的结果（唯一的区别在于LEN 字段——来自文件的 LEN 是 80，而来自标量的 LEN 是 8)。

那么到底是怎么回事呢？如果我向其传递只有 "0" 且没有回车符的行，为什么我的 while() 循环不会提前退出？ Evan 的循环实际上更具防御性，还是 Perl 在内部做了一些疯狂的事情，这意味着您不需要担心这些事情，而 while() 实际上只在您点击 eof 时退出代码>？

原文

I usually loop through lines in a file using the following code:

open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
  ...
}

However, in answering another question, Evan Carroll edited my answer, changing my while statement to:

while ( defined( my $line = <$fh> ) ) {
  ...
}

His rationale was that if you have a line that's 0 (it'd have to be the last line, else it would have a carriage return) then your while would exit prematurely if you used my statement ($line would be set to "0", and the return value from the assignment would thus also be "0" which gets evaluated to false). If you check for defined-ness, then you don't run into this problem. It makes perfect sense.

So I tried it. I created a textfile whose last line is 0 with no carriage return on it. I ran it through my loop and the loop did not exit prematurely.

I then thought, "Aha, maybe the value isn't actually 0, maybe there's something else there that's screwing things up!" So I used Dump() from Devel::Peek and this is what it gave me:

SV = PV(0x635088) at 0x92f0e8
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  PV = 0X962600 "0"\0
  CUR = 1
  LEN = 80

That seems to tell me that the value is actually the string "0", as I get a similar result if I call Dump() on a scalar I've explicitly set to "0" (the only difference is in the LEN field -- from the file LEN is 80, whereas from the scalar LEN is 8).

So what's the deal? Why doesn't my while() loop exit prematurely if I pass it a line that's only "0" with no carriage return? Is Evan's loop actually more defensive, or does Perl do something crazy internally that means you don't need to worry about these things and while() actually only does exit when you hit eof?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

满地尘埃落定 2024-10-01 17:08:28

因为

 while (my $line = <$fh>) { ... }

实际上编译为

 while (defined( my $line = <$fh> ) ) { ... }

在非常旧的 Perl 版本中可能是必要的，但现在不再需要了！您可以通过在脚本上运行 B::Deparse 来看到这一点：

>perl -MO=Deparse
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
  ...
}

^D
die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
while (defined(my $line = <$fh>)) {
    do {
        die 'Unimplemented'
    };
}
- syntax OK

所以您已经可以开始了！

Because

 while (my $line = <$fh>) { ... }

actually compiles down to

 while (defined( my $line = <$fh> ) ) { ... }

It may have been necessary in a very old version of perl, but not any more! You can see this from running B::Deparse on your script:

>perl -MO=Deparse
open my $fh, '<', $file or die "Could not open file $file for reading: $!\n";
while ( my $line = <$fh> ) {
  ...
}

^D
die "Could not open file $file for reading: $!\n" unless open my $fh, '<', $file;
while (defined(my $line = <$fh>)) {
    do {
        die 'Unimplemented'
    };
}
- syntax OK

So you're already good to go!

回复收藏 0 原文

故事↓在人 2024-10-01 17:08:28

顺便说一句，这在 perldoc perlop 的 I/O 运算符部分中有介绍。 :

在标量上下文中，评估尖括号中的文件句柄会产生该文件的下一行（包括换行符，如果有的话），或者在文件末尾或出现错误时产生“undef”。当 $/ 设置为“undef”（有时称为 file-slurp 模式）并且文件为空时，它第一次返回 ''，随后返回“undef”。
通常，您必须将返回值分配给变量，但有一种情况会发生自动分配。当且仅当输入符号是“while”语句的条件中的唯一内容（即使伪装成“for(;;)”循环），该值才会自动分配给全局变量 $_，从而破坏任何内容以前在那里。（这对您来说可能看起来很奇怪，但您将在编写的几乎每个 Perl 脚本中使用该构造。） $_ 变量不是隐式本地化的。你必须输入“local $_;”如果您希望发生这种情况，请在循环之前进行。
以下几行是等效的：
while (已定义($_ = )) { print; }
while ($_ = ) { 打印; }
while () { 打印; }
for (;;) { 打印; }
定义时打印($_ = );
打印 while ($_ = );
打印同时 ;
这也有类似的行为，但避免了 $_ ：
while (my $line = ) { print $line }
在这些循环结构中，然后测试分配的值（无论分配是自动的还是显式的）以查看它是否已定义。定义的测试避免了 line 具有被 Perl 视为 false 的字符串值的问题，例如没有尾随换行符的“”或“0”。如果您确实想让这些值终止循环，则应该对它们进行显式测试：
while (($_ = ) ne '0') { ... }
while () { 最后除非 $_; ... }
在其他布尔上下文中，“”如果“use warnings”编译指示或 -w 命令行开关（$^W 变量）有效，则没有显式“定义”测试或比较会引发警告。

BTW, this is covered in the I/O Operators section of perldoc perlop:

In scalar context, evaluating a filehandle in angle brackets yields the next line from that file (the newline, if any, included), or "undef" at end-of-file or on error. When $/ is set to "undef" (sometimes known as file-slurp mode) and the file is empty, it returns '' the first time, followed by "undef" subsequently.
Ordinarily you must assign the returned value to a variable, but there is one situation where an automatic assignment happens. If and only if the input symbol is the only thing inside the conditional of a "while" statement (even if disguised as a "for(;;)" loop), the value is automatically assigned to the global variable $_, destroying whatever was there previously. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) The $_ variable is not implicitly localized. You'll have to put a "local $_;" before the loop if you want that to happen.
The following lines are equivalent:
while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;
This also behaves similarly, but avoids $_ :
while (my $line = <STDIN>) { print $line }
In these loop constructs, the assigned value (whether assignment is automatic or explicit) is then tested to see whether it is defined. The defined test avoids problems where line has a string value that would be treated as false by Perl, for example a "" or a "0" with no trailing newline. If you really mean for such values to terminate the loop, they should be tested for explicitly:
while (($_ = <STDIN>) ne '0') { ... }
while (<STDIN>) { last unless $_; ... }
In other boolean contexts, "<filehandle>" without an explicit "defined" test or comparison elicit a warning if the "use warnings" pragma or the -w command-line switch (the $^W variable) is in effect.

回复收藏 0 原文

千寻… 2024-10-01 17:08:28

虽然 while (my $line=<$fh>) { ... } 的形式得到编译到 while (已定义( my $line = <$fh> ) ) { ... } 考虑到，如果没有显式定义，则值“0”的合法读取可能会被误解。在循环中或测试 <> 的返回。

这里有几个例子：

#!/usr/bin/perl
use strict; use warnings;

my $str = join "", map { "$_\n" } -10..10;
$str.="0";
my $sep='=' x 10;
my ($fh, $line);

open $fh, '<', \$str or 
     die "could not open in-memory file: $!";

print "$sep Should print:\n$str\n$sep\n";     

#Failure 1:
print 'while ($line=chomp_ln()) { print "$line\n"; }:',
      "\n";
while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
rewind();
print "$sep\n";

#Failure 2:
print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
print "$sep\n";
last_char();

#Failure 3:
# fails on last line of "0" 
print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
if(my $l=<$fh>) { print "$l\n" } 
print "$sep\n";
last_char();

#Failure 4 and no Perl warning:
print 'print "$_\n" if <$fh>;',"\n";
print "$_\n" if <$fh>; #fails to print;
print "$sep\n";
last_char();

#Failure 5
# fails on last line of "0" with no Perl warning
print 'if($line=<$fh>) { print $line; }', "\n";
if($line=<$fh>) { 
    print $line; 
} else {
    print "READ ERROR: That was supposed to be the last line!\n";
}    
print "BUT, line read really was: \"$line\"", "\n\n";

sub chomp_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if($line=<$fh>) {
        chomp $line ;
        return $line;
    }
    return undef;
}

sub trim_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if (my $line=<$fh>) {
        $line =~ s/^\s+//;
        $line =~ s/\s+$//;
        return $line;
    }
    return undef;

}

sub rewind {
    seek ($fh, 0, 0) or 
        die "Cannot seek on in-memory file: $!";
}

sub last_char {
    seek($fh, -1, 2) or
       die "Cannot seek on in-memory file: $!";
}

我并不是说这些是 Perl 的良好形式！我是说它们是可能的；特别是故障 3,4 和 5。请注意第 4 和 5 号上没有 Perl 警告的故障。前两个有自己的问题...

While it is correct that the form of while (my $line=<$fh>) { ... } gets compiled to while (defined( my $line = <$fh> ) ) { ... } consider there are a variety of times when a legitimate read of the value "0" is misinterpreted if you do not have an explicit defined in the loop or testing the return of <>.

Here are several examples:

#!/usr/bin/perl
use strict; use warnings;

my $str = join "", map { "$_\n" } -10..10;
$str.="0";
my $sep='=' x 10;
my ($fh, $line);

open $fh, '<', \$str or 
     die "could not open in-memory file: $!";

print "$sep Should print:\n$str\n$sep\n";     

#Failure 1:
print 'while ($line=chomp_ln()) { print "$line\n"; }:',
      "\n";
while ($line=chomp_ln()) { print "$line\n"; } #fails on "0"
rewind();
print "$sep\n";

#Failure 2:
print 'while ($line=trim_ln()) { print "$line\n"; }',"\n";
while ($line=trim_ln()) { print "$line\n"; } #fails on "0"
print "$sep\n";
last_char();

#Failure 3:
# fails on last line of "0" 
print 'if(my $l=<$fh>) { print "$l\n" }', "\n";
if(my $l=<$fh>) { print "$l\n" } 
print "$sep\n";
last_char();

#Failure 4 and no Perl warning:
print 'print "$_\n" if <$fh>;',"\n";
print "$_\n" if <$fh>; #fails to print;
print "$sep\n";
last_char();

#Failure 5
# fails on last line of "0" with no Perl warning
print 'if($line=<$fh>) { print $line; }', "\n";
if($line=<$fh>) { 
    print $line; 
} else {
    print "READ ERROR: That was supposed to be the last line!\n";
}    
print "BUT, line read really was: \"$line\"", "\n\n";

sub chomp_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if($line=<$fh>) {
        chomp $line ;
        return $line;
    }
    return undef;
}

sub trim_ln {
# if I have "warnings", Perl says:
#    Value of <HANDLE> construct can be "0"; test with defined() 
    if (my $line=<$fh>) {
        $line =~ s/^\s+//;
        $line =~ s/\s+$//;
        return $line;
    }
    return undef;

}

sub rewind {
    seek ($fh, 0, 0) or 
        die "Cannot seek on in-memory file: $!";
}

sub last_char {
    seek($fh, -1, 2) or
       die "Cannot seek on in-memory file: $!";
}

I am not saying these are good forms of Perl! I am saying that they are possible; especially Failure 3,4 and 5. Note the failure with no Perl warning on number 4 and 5. The first two have their own issues...

回复收藏 0 原文

~没有更多了~