如何在一行上打印某些起始行和终止行之间的所有内容?

发布于 2024-12-19 07:06:30 字数 367 浏览 1 评论 0原文

while(<FILE>)
{
    chomp $_;
    $line[$i]=$_;
    ++$i;
}

for($j=0;$j<$i;++$j)
{
    if($line[$j]=~/Syn_Name/)
    {
        do
        {
            print OUT $line[$j],"\n";
            ++$j;
        }
        until($line[$j]=~/^\s*$/)
    }
}

这是我的代码,我试图打印 Syn_Name 和空行之间的数据。 我的代码提取了我需要的块。 但块之间的数据是逐行打印的。我希望每个块的数据打印在一行上。

while(<FILE>)
{
    chomp $_;
    $line[$i]=$_;
    ++$i;
}

for($j=0;$j<$i;++$j)
{
    if($line[$j]=~/Syn_Name/)
    {
        do
        {
            print OUT $line[$j],"\n";
            ++$j;
        }
        until($line[$j]=~/^\s*$/)
    }
}

This is my code I am trying to print data between Syn_Name and a blank line.
My code extracts the chunk that I need.
But the data between the chunk is printed line by line. I want the data for each chunk to get printed on a single line.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

君勿笑 2024-12-26 07:06:30

简化您的代码。使用触发器操作符来控制打印。请注意,打印最后一行不会添加换行符(除非该行包含多个换行符)。最好的情况是,它打印空字符串。最坏的情况是,它会打印空格。

您不需要线条的转换数组,可以使用 while 循环。如果您无论如何都想存储这些行,我添加了一条注释行,说明了如何最好地完成此操作。

#chomp(my @line = <FILE>);
while (<FILE>) {
    chomp;
    if(/Syn_Name/ .. /^\s*$/) {
        print OUT;
        print "\n" if /^\s*$/;
    }
}

Simplification of your code. Using the flip-flop operator to control the print. Note that printing the final line will not add a newline (unless the line contained more than one newline). At best, it prints the empty string. At worst, it prints whitespace.

You do not need a transition array for the lines, you can use a while loop. In case you want to store the lines anyway, I added a commented line with how that is best done.

#chomp(my @line = <FILE>);
while (<FILE>) {
    chomp;
    if(/Syn_Name/ .. /^\s*$/) {
        print OUT;
        print "\n" if /^\s*$/;
    }
}
舟遥客 2024-12-26 07:06:30

目录

  • Idiomatic Perl
  • 让错误更容易修复
    • 有关常见编程错误的警告
    • 变量名一致时才执行
    • 养成这个习惯将为您节省大量时间
  • Perl 的范围运算
  • 符 工作演示
    • 立即打印切碎的行
    • 用空格连接行
    • 还有一种极端情况

Idiomatic Perl

您似乎有 C 系列语言的背景。这很好,因为它可以完成工作,但您可以让 Perl 为您处理机器,即

  • < code>chomp 默认为 $_ (对于许多其他 Perl 运算符也是如此)
  • push 将一个元素添加到数组末尾

以简化您的第一个循环:

while (<FILE>)
{
    chomp;
    push @line, $_;
}

现在您无需更新 $i跟踪您已经添加到数组中的行数。

在第二个循环中,不使用 C 样式 for 循环,而是使用 foreach 循环:

foreach 循环迭代普通列表值,并将变量 VAR 依次设置为列表的每个元素......

foreach 关键字实际上是 for 关键字的同义词,因此您可以使用 foreach 来提高可读性,或使用 for 为简洁起见。 (或者因为 Bourne shell 比 csh 更熟悉,所以编写 for 会更自然。)如果省略 VAR,则 $_设置为每个值。


这样,Perl 就会为您处理簿记工作。

for (@line)
{
    # $_ is the current element of @line
    ...
}

使错误更容易修复

有时 Perl 可能太包容了。假设在第二个循环中您犯了一个简单的印刷错误:

for (@lines)

运行您的程序现在根本不会产生任何输出,即使输入包含 Syn_Name 块。

人们可以查看代码并发现您可能打算处理刚刚创建的数组并错误地将数组名称复数化。 Perl 急于提供帮助,创建了一个新的空 @lines 数组,这使得您的 foreach 循环无事可做。

您可以删除数组名称末尾的虚假s,但程序仍然不产生任何输出!例如,您可能有未处理的输入组合,无法打开 OUT 文件句柄。

Perl 有一些简单的方法可以让您避免因处理无声故障而遭受这些(以及更多!)的挫败感。

有关常见编程错误的警告

您可以打开大量警告列表来帮助诊断常见编程问题。通过我想象的代码的错误版本,Perl 可以告诉您

Name "main::lines" used only once: possible typo at ./synname line 16.

,并且在修复数组名称中的拼写错误后

print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.

,您会立即看到有价值的信息,这些信息可能很难或至少是单调乏味地独立发现:

  1. 变量名称不一致,并且
  2. 程序正在尝试产生输出,但需要更多的管道。

除非变量名一致,否则不要执行

请注意,即使存在上述潜在问题,Perl 仍会尝试执行。对于某些类型的问题(例如变量命名不一致),您可能希望 Perl 不执行您的程序,而是停下来让您先修复它。你可以告诉 Perl 严格对待变量

如果您访问未通过 ouruse vars 声明、通过 my 本地化的变量,则会生成编译时错误,或者不完全合格。

权衡是您必须明确哪些变量您打算成为程序的一部分,而不是让它们在第一次使用时方便地发挥作用。在第一个循环之前,您需要声明

my @line;

表达您的意图。然后,由于错误地复数数组名称的错误,Perl 失败,

Global symbol "@lines" requires explicit package name at ./synname line 16.
Execution of ./synname aborted due to compilation errors.

并且您确切地知道哪一行包含错误。

养成这个习惯会为你节省很多时间。

我几乎开始编写每一个重要的 Perl 程序。

#! /usr/bin/env perl

use strict;
use warnings;

第一个是 shebang 行,就 Perl 而言是一个普通的注释。 use 行启用 strict 编译指示和 警告 编译指示。

不想成为一个严格僵尸,如马克Dominus 斥责,我会指出 use strict; 如上所述,不带任何选项使得 Perl 严格处理三个容易出错的区域:

  1. 严格变量,如上所述;
  2. 严格引用,不允许使用符号引用; 严格的子程序
  3. ,要求程序员在引用子程序时更加小心。

这是一个非常有用的默认值。有关更多详细信息,请参阅strict pragma 文档

Perl 的范围运算符

perlop 文档 描述了 ..,Perl 的范围运算符,可以帮助您大大简化第二个循环中的逻辑:

在标量上下文中,.. 返回一个布尔值。该运算符是双稳态的,就像触发器一样,并模拟 sedawk 和各种编辑器的行范围(逗号)运算符。每个 .. 运算符都维护自己的布尔状态,即使在调用包含它的子例程时也是如此。只要它的左操作数为假,它就是假的。一旦左操作数为真,范围运算符将保持为真,直到右操作数为真,之后范围运算符再次变为假。在下次计算范围运算符之前它不会变为 false。

在你的问题中,你写道你想要“Syn_Name 和空行之间的数据”,在 Perl 中拼写

/Syn_Name/ .. /^\s*$/

为在你的情况下,你还想在范围末尾做一些特殊的事情,并且 .. 也提供了这种情况,同上。

范围中的最终序列号附加了字符串“E0”,这不会影响其数值,但如果您想排除端点,则可以为您提供搜索内容.

分配从 .. 返回的值(我通常对名为 $inside$is_inside 的标量执行此操作)允许您检查是否“ re 在末尾,例如

my $is_inside = /Syn_Name/ .. /^\s*$/;
if ($is_inside =~ /E0$/) {
    ...
}

以这种方式编写还可以避免重复终止条件的代码(.. 的右侧操作数)。这样,如果您需要更改逻辑,只需在一处更改即可。当您必须记住时,有时您会忘记并产生错误。

工作演示

请参阅下面的代码,您可以复制并粘贴以获得工作程序。出于演示目的,它们从内置 DATA 文件句柄读取输入并将输出写入 STDOUT。以这种方式编写意味着您可以将我的代码转移到您的代码中,而无需进行很少的修改或无需修改。

立即打印 chomped 行

正如您的问题中所定义的,不需要一个循环来收集临时数组中的行,然后另一个循环来处理该数组。考虑以下代码,

#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*FILE = *DATA;
*OUT = *STDOUT;

while (<FILE>)
{
    chomp;
    if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
        my $is_last = $is_inside =~ /E0$/;
        print OUT $_, $is_last ? "\n" : ();
    }
}

__DATA__
ERROR IF PRESENT IN OUTPUT!

Syn_Name
foo
bar
baz

ERROR IF PRESENT IN OUTPUT!

其输出为“

Syn_Namefoobarbaz

我们总是打印当前行,存储在 $_”中。当我们到达范围的末尾时,即当 $is_last 为 true 时,我们还会打印换行符。当 $is_last 为 false 时,三元运算符另一个分支中的空列表就是结果,这意味着我们仅打印 $_,不打印换行符。

用空格连接行

您没有向我们展示示例输入,所以我想知道您是否真的想将这些行连接在一起而不是 用空格连接它们。如果您想要后一种行为,则程序将变为

#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*FILE = *DATA;
*OUT = *STDOUT;

my @lines;
while (<FILE>)
{
    chomp;
    if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
        push @lines, $_;
        if ($is_inside =~ /E0$/) {
            print OUT join(" ", @lines), "\n";
            @lines = ();
        }
    }
}

__DATA__
ERROR IF PRESENT IN OUTPUT!

Syn_Name
foo
bar
baz

ERROR IF PRESENT IN OUTPUT!

此代码仅在 Syn_Name 块内的那些行中累积@lines,打印该块,并在我们执行时清除@lines。看到终结者。现在的输出是

Syn_Name foo bar baz

另一种边缘情况

最后,如果我们在文件末尾看到 Syn_Name 但没有终止空行,会发生什么情况?对于您的数据来说这可能是不可能的,但如果您需要处理它,您将需要使用 Perl 的 eof 运算符

eof FILEHANDLE
eof

如果 FILEHANDLE 的下一次读取将返回文件结尾,或者 FILEHANDLE 未打开,则返回 1 ... 不带参数的 eof 使用最后读取的文件。

因此,我们在空行或文件末尾处终止。

#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*FILE = *DATA;
*OUT = *STDOUT;

my @lines;
while (<FILE>)
{
    s/\s+$//;
    #if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
    if (my $is_inside = /Syn_Name/ .. /^\s*$/ || eof) {
        push @lines, $_;
        if ($is_inside =~ /E0$/) {
            print OUT join(" ", @lines), "\n";
            @lines = ();
        }
    }
}

__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar

YOU CANT SEE ME!
Syn_Name
quux
potrzebie

输出:

Syn_Name foo bar 
Syn_Name quux potrzebie

这里代码删除了所有尾随的内容,而不是 chomp行尾不可见的空白。即使输入有点马虎,这也将确保连接线之间的间距是均匀的。

如果没有 eof 检查,程序不会打印后一行,您可以通过注释掉活动条件并取消注释另一行来看到这一点。

Contents

  • Idiomatic Perl
  • Make errors easier to fix
    • Warnings about common programming errors
    • Don't execute unless variable names are consistent
    • Developing this habit will save you lots of time
  • Perl's range operator
  • Working demos
    • Print chomped lines immediately
    • Join lines with spaces
    • One more edge case

Idiomatic Perl

You seem to have a background with the C family of languages. This is fine because it gets the job done, but you can let Perl handle the machinery for you, namely

  • chomp defaults to $_ (also true with many other Perl operators)
  • push adds an element to the end of an array

to simplify your first loop:

while (<FILE>)
{
    chomp;
    push @line, $_;
}

Now you don't have update $i to keep track of how many lines you've already added to the array.

On the second loop, instead of using a C-style for loop, use a foreach loop:

The foreach loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn …

The foreach keyword is actually a synonym for the for keyword, so you can use foreach for readability or for for brevity. (Or because the Bourne shell is more familiar to you than csh, so writing for comes more naturally.) If VAR is omitted, $_ is set to each value.

This way, Perl handles the bookkeeping for you.

for (@line)
{
    # $_ is the current element of @line
    ...
}

Make errors easier to fix

Sometimes Perl can be too accommodating. Say in the second loop you made an easy typographical error:

for (@lines)

Running your program now produces no output at all, even if the input contains Syn_Name chunks.

A human can look at the code and see that you probably intended to process the array you just created and pluralized the name of the array by mistake. Perl, being eager to help, creates a new empty @lines array, which leaves your foreach loop with nothing to do.

You may delete the spurious s at the end of the array's name but still have a program produces no output! For example, you may have an unhandled combination of inputs that doesn't open the OUT filehandle.

Perl has a couple of easy ways to spare you these (and more!) kinds of frustration from dealing with silent failures.

Warnings about common programming errors

You can turn on an enormous list of warnings that help diagnose common programming problems. With my imagined buggy version of your code, Perl could have told you

Name "main::lines" used only once: possible typo at ./synname line 16.

and after fixing the typo in the array name

print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.
print() on unopened filehandle OUT at ./synname line 20, <FILE> line 8.

Right away, you see valuable information that may be difficult or at least tedious to spot unaided:

  1. variable names are inconsistent, and
  2. the program is trying to produce output but needs a little more plumbing.

Don't execute unless variable names are consistent

Notice that even with the potential problems above, Perl tried to execute anyway. With some classes of problems such as the variable-naming inconsistency, you may prefer that Perl not execute your program but stop and make you fix it first. You can tell Perl to be strict about variables:

This generates a compile-time error if you access a variable that wasn't declared via our or use vars, localized via my, or wasn't fully qualified.

The tradeoff is you have to be explicit about which variables you intend to be part of your program instead of allowing them to conveniently spring to life upon first use. Before the first loop, you would declare

my @line;

to express your intent. Then with the bug of a mistakenly pluralized array name, Perl fails with

Global symbol "@lines" requires explicit package name at ./synname line 16.
Execution of ./synname aborted due to compilation errors.

and you know exactly which line contains the error.

Developing this habit will save you lots of time

I begin almost every non-trivial Perl program I write with

#! /usr/bin/env perl

use strict;
use warnings;

The first is the shebang line, an ordinary comment as far as Perl is concerned. The use lines enable the strict pragma and the warnings pragma.

Not wanting to be a strict-zombie, as Mark Dominus chided, I'll point out that use strict; as above with no option makes Perl strict in dealing with three error-prone areas:

  1. strict vars, as described above;
  2. strict refs, disallows use of symbolic references; and
  3. strict subs, requires the programmer to be more careful in referring to subroutines.

This is a highly useful default. See the strict pragma's documentation for more details.

Perl's range operator

The perlop documentation describes .., Perl's range operator, that can help you greatly simplify the logic in your second loop:

In scalar context, .. returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each .. operator maintains its own boolean state, even across calls to a subroutine that contains it. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. It doesn't become false till the next time the range operator is evaluated.

In your question, you wrote that you want “data between Syn_Name and a blank line,” which in Perl is spelled

/Syn_Name/ .. /^\s*$/

In your case, you also want to do something special at the end of the range, and .. provides for that case too, ibid.

The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint.

Assigning the value returned from .. (which I usually do to a scalar named $inside or $is_inside) allows you to check whether you're at the end, e.g.,

my $is_inside = /Syn_Name/ .. /^\s*$/;
if ($is_inside =~ /E0$/) {
    ...
}

Writing it this way also avoids duplicating the code for your terminating condition (the right-hand operand of ..). This way if you need to change the logic, you change it in only one place. When you have to remember, you'll forget sometimes and create bugs.

Working demos

See below for code you can copy-and-paste to get working programs. For demo purposes, they read input from the built-in DATA filehandle and write output to STDOUT. Writing it this way means you can transfer my code into yours with little or no modification.

Print chomped lines immediately

As defined in your question, there's no need for one loop to collect the lines in a temporary array and then another loop to process the array. Consider the following code

#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*FILE = *DATA;
*OUT = *STDOUT;

while (<FILE>)
{
    chomp;
    if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
        my $is_last = $is_inside =~ /E0$/;
        print OUT $_, $is_last ? "\n" : ();
    }
}

__DATA__
ERROR IF PRESENT IN OUTPUT!

Syn_Name
foo
bar
baz

ERROR IF PRESENT IN OUTPUT!

whose output is

Syn_Namefoobarbaz

We always print the current line, stored in $_. When we're at the end of the range, that is, when $is_last is true, we also print a newline. When $is_last is false, the empty list in the other branch of the ternary operator is the result—meaning we print $_ only, no newline.

Join lines with spaces

You didn't show us an example input, so I wonder whether you really want to butt the lines together rather than joining them with spaces. If you want the latter behavior, then the program becomes

#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*FILE = *DATA;
*OUT = *STDOUT;

my @lines;
while (<FILE>)
{
    chomp;
    if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
        push @lines, $_;
        if ($is_inside =~ /E0$/) {
            print OUT join(" ", @lines), "\n";
            @lines = ();
        }
    }
}

__DATA__
ERROR IF PRESENT IN OUTPUT!

Syn_Name
foo
bar
baz

ERROR IF PRESENT IN OUTPUT!

This code accumulates in @lines only those lines within a Syn_Name chunk, prints the chunk, and clears out @lines when we see the terminator. The output is now

Syn_Name foo bar baz

One more edge case

Finally, what happens if we see Syn_Name at the end of the file but without a terminating blank line? That may be impossible with your data, but in case you need to handle it, you'll want to use Perl's eof operator.

eof FILEHANDLE
eof

Returns 1 if the next read on FILEHANDLE will return end of file or if FILEHANDLE is not open … An eof without an argument uses the last file read.

So we terminate on either a blank line or end of file.

#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*FILE = *DATA;
*OUT = *STDOUT;

my @lines;
while (<FILE>)
{
    s/\s+$//;
    #if (my $is_inside = /Syn_Name/ .. /^\s*$/) {
    if (my $is_inside = /Syn_Name/ .. /^\s*$/ || eof) {
        push @lines, $_;
        if ($is_inside =~ /E0$/) {
            print OUT join(" ", @lines), "\n";
            @lines = ();
        }
    }
}

__DATA__
ERROR IF PRESENT IN OUTPUT!
Syn_Name
foo
bar

YOU CANT SEE ME!
Syn_Name
quux
potrzebie

Output:

Syn_Name foo bar 
Syn_Name quux potrzebie

Here instead of chomp, the code removes any trailing invisible whitespace at the ends of lines. This will make sure spacing between joined lines is uniform even if the input is a little sloppy.

Without the eof check, the program does not print the latter line, which you can see by commenting out the active conditional and uncommenting the other.

忆梦 2024-12-26 07:06:30

另一个简化版本:

foreach (grep {chomp; /Syn_Name/ .. /^\s*$/ } <FILE>) {
    print OUT;
    print OUT "\n" if /^\s*$/;
}

Another simplified version:

foreach (grep {chomp; /Syn_Name/ .. /^\s*$/ } <FILE>) {
    print OUT;
    print OUT "\n" if /^\s*$/;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文