如何在 Perl 中提取两个行分隔符之间的行?

发布于 2024-07-29 05:02:02 字数 493 浏览 5 评论 0原文

我有一个 ASCII 日志文件,其中包含一些我想要提取的内容。 我从来没有花时间正确学习 Perl,但我认为这是完成这项任务的好工具。

该文件的结构如下:

... 
... some garbage 
... 
... garbage START
what i want is 
on different
lines 
END 
... 
... more garbage ...
next one START 
more stuff I want, again
spread 
through 
multiple lines 
END 
...
more garbage

因此,我正在寻找一种方法来提取每个 STARTEND 分隔符字符串之间的行。 我怎样才能做到这一点?

到目前为止,我只找到了一些有关如何使用 START 字符串打印一行的示例,或与我正在寻找的内容有些相关的其他文档项目。

I have an ASCII log file with some content I would like to extract. I've never taken time to learn Perl properly, but I figure this is a good tool for this task.

The file is structured like this:

... 
... some garbage 
... 
... garbage START
what i want is 
on different
lines 
END 
... 
... more garbage ...
next one START 
more stuff I want, again
spread 
through 
multiple lines 
END 
...
more garbage

So, I'm looking for a way to extract the lines between each START and END delimiter strings.
How can I do this?

So far, I've only found some examples on how to print a line with the START string, or other documentation items that are somewhat related with what I'm looking for.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

生生漫 2024-08-05 05:02:02

您需要触发器运算符(也称为范围运算符)..

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
  if (/START/../END/) {
    next if /START/ || /END/;
    print;
  }
}

将对 print 的调用替换为您实际想要执行的操作(例如,将行推入一个数组,编辑它,格式化它,等等)。 我next - 超过了实际具有STARTEND 的行,但您可能不希望出现这种行为。 有关此运算符的讨论,请参阅本文其他有用的 Perl 特殊变量。

You want the flip-flop operator (also known as the range operator) ..

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
  if (/START/../END/) {
    next if /START/ || /END/;
    print;
  }
}

Replace the call to print with whatever you actually want to do (e.g., push the line into an array, edit it, format it, whatever). I'm next-ing past the lines that actually have START or END, but you may not want that behavior. See this article for a discussion of this operator and other useful Perl special variables.

帝王念 2024-08-05 05:02:02

来自 perlfaq6 的回答 如何拉出本身位于不同行的两个模式之间的行?


您可以使用 Perl 有点奇特的 .. 运算符(在 perlop 中记录):

perl -ne 'print if /START/ .. /END/' file1 file2 ...

如果您想要 但是,

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

如果您想要嵌套出现 START 到 END,您将遇到本节中有关匹配平衡文本的问题中描述的问题。

这是使用 .. 的另一个示例:

while (<>) {
    $in_header =   1  .. /^$/;
    $in_body   = /^$/ .. eof;
# now choose between them
} continue {
    $. = 0 if eof;  # fix $.
}

From perlfaq6's answer to How can I pull out lines between two patterns that are themselves on different lines?


You can use Perl's somewhat exotic .. operator (documented in perlop):

perl -ne 'print if /START/ .. /END/' file1 file2 ...

If you wanted text and not lines, you would use

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text.

Here's another example of using ..:

while (<>) {
    $in_header =   1  .. /^$/;
    $in_body   = /^$/ .. eof;
# now choose between them
} continue {
    $. = 0 if eof;  # fix $.
}
木有鱼丸 2024-08-05 05:02:02

如何在Perl 中的匹配行?

那怎么样? 其中,END 字符串是 $^,您可以将其更改为您的 END 字符串。

我也是新手,但是那里的解决方案提供了相当多的方法...让我更具体地知道您想要的与上面的链接不同的是什么。

How can I grab multiple lines after a matching line in Perl?

How's that one? In that one, the END string is $^, you can change it to your END string.

I am also a novice, but the solutions there provide quite a few methods... let me know more specifically what it is you want that differs from the above link.

樱&纷飞 2024-08-05 05:02:02
while (<>) {
    chomp;      # strip record separator
    if(/END/) { $f=0;}
    if (/START/) {
        s/.*START//g;
        $f=1;
    }
    print $_ ."\n" if $f;
}

下次尝试写一些代码

while (<>) {
    chomp;      # strip record separator
    if(/END/) { $f=0;}
    if (/START/) {
        s/.*START//g;
        $f=1;
    }
    print $_ ."\n" if $f;
}

try to write some code next time round

凉世弥音 2024-08-05 05:02:02

忒勒马科斯答复后,事情开始倾泻而出。 这毕竟是我正在寻找的解决方案。

  1. 我试图在单独的行中提取由两个字符串分隔的行(一个以“CINFILE=”结尾的行;另一个以包含单个“#”的行),不包括分隔符行。 我可以用特勒马科斯的解决方案来做到这一点。
  2. 第一行有一个空格我想删除。 我也把它包括在内。
  3. 我还尝试将每个行集提取到单独的文件中。

这对我有用,尽管代码可以被归类为丑陋; 这是因为我目前几乎是 Perl 的新手。 无论如何,

#!/usr/bin/env perl
use strict;
use warnings;

my $start='CINFILE=

我希望它也能对其他人有益。
干杯。

; my $stop='^#

我希望它也能对其他人有益。
干杯。

; my $filename; my $output; my $counter=1; my $found=0; while (<>) { if (/$start/../$stop/) { $filename=sprintf("boletim_%06d.log",$counter); open($output,'>>'.$filename) or die $!; next if /$start/ || /$stop/; if($found == 0) { print $output (split(/ /))[1]; } else { print $output $_; } $found=1; } else { if($found == 1) { close($output); $counter++; $found=0; } } }

我希望它也能对其他人有益。
干杯。

After Telemachus' reply, things started pouring out. This works as the solution I'm looking at after all.

  1. I'm trying to extract lines delimited by two strings (one, with a line ending with "CINFILE="; other, with a line containing a single "#") in separate lines, excluding the delimiter lines. This I can do with Telemachus' solution.
  2. The first line has a space I want to remove. I'm also including it.
  3. I'm also trying to extract each line-set into separate files.

This works for me, although the code can be classified as ugly; this is because I'm currently a virtually newcomer to Perl. Anyway here goes:

#!/usr/bin/env perl
use strict;
use warnings;

my $start='CINFILE=

I hope it benefits others as well.
Cheers.

; my $stop='^#

I hope it benefits others as well.
Cheers.

; my $filename; my $output; my $counter=1; my $found=0; while (<>) { if (/$start/../$stop/) { $filename=sprintf("boletim_%06d.log",$counter); open($output,'>>'.$filename) or die $!; next if /$start/ || /$stop/; if($found == 0) { print $output (split(/ /))[1]; } else { print $output $_; } $found=1; } else { if($found == 1) { close($output); $counter++; $found=0; } } }

I hope it benefits others as well.
Cheers.

秋凉 2024-08-05 05:02:02

对于来自“虚拟新人”来说还不错。 您可以做的一件事是将“$found=1”放在“if($found == 0)”块内,这样您就不会每次在 $start 和 $stop 之间执行该分配。

在我看来,另一件有点丑陋的事情是,每次输入 $start/$stop-block 时都会打开相同的文件处理程序。

这显示了一种解决方法:

#!/usr/bin/perl

use strict;
use warnings;

my $start='CINFILE=
;
my $stop='^#
;
my $filename;
my $output;
my $counter=1;
my $found=0;

while (<>) {

    # Find block of lines to extract                                                           
    if( /$start/../$stop/ ) {

        # Start of block                                                                       
        if( /$start/ ) {
            $filename=sprintf("boletim_%06d.log",$counter);
            open($output,'>>'.$filename) or die $!;
        }
        # End of block                                                                         
        elsif ( /$end/ ) {
            close($output);
            $counter++;
            $found = 0;
        }
        # Middle of block                                                                      
        else{
            if($found == 0) {
                print $output (split(/ /))[1];
                $found=1;
            }
            else {
                print $output $_;
            }
        }

    }
    # Find block of lines to extract                                                           

}

Not too bad for coming from a "virtual newcommer". One thing you could do, is to put the "$found=1" inside of the "if($found == 0)" block so that you don't do that assignment every time between $start and $stop.

Another thing that is a bit ugly, in my opinion, is that you open the same filehandler each time you enter the $start/$stop-block.

This shows a way around that:

#!/usr/bin/perl

use strict;
use warnings;

my $start='CINFILE=
;
my $stop='^#
;
my $filename;
my $output;
my $counter=1;
my $found=0;

while (<>) {

    # Find block of lines to extract                                                           
    if( /$start/../$stop/ ) {

        # Start of block                                                                       
        if( /$start/ ) {
            $filename=sprintf("boletim_%06d.log",$counter);
            open($output,'>>'.$filename) or die $!;
        }
        # End of block                                                                         
        elsif ( /$end/ ) {
            close($output);
            $counter++;
            $found = 0;
        }
        # Middle of block                                                                      
        else{
            if($found == 0) {
                print $output (split(/ /))[1];
                $found=1;
            }
            else {
                print $output $_;
            }
        }

    }
    # Find block of lines to extract                                                           

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文