awk 到 perl 的转换

发布于 2024-10-14 21:48:44 字数 2237 浏览 0 评论 0原文

我有一个充满文件的目录，其中包含如下记录：

FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx


                                                                      01/26/2011
     These items are being held for you at the location shown below each one.
     IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.

     The Waltons. The complete  DAXXXX12118198
     Pickup at:CHUPACABRA LOCATION                                 02/02/2011







                                                  GRIMLY, WILFORD
                                                  29 FAKE LANE
                                                  S. BLEMPGLORFF RI  99XXX

我需要删除带有表达式 Pickup at:CHUPACABRA LOCATION 的所有条目。

“记录分隔符”问题：我无法修改输入文件的格式 - 它必须按原样保留。每条记录由大约 40 多行新行分隔。

这是一些 awk （这有效）：

BEGIN { 
    RS="\n\n\n\n\n\n\n\n\n+" 
    FS="\n"
}
!/CHUPACABRA/{print $0}

我对 perl 的尝试：

perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";chomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000

没有返回任何内容。我不知道如何在 perl 中指定“字段分隔符”，除了在命令行中。尝试了 a2p 实用程序 - 没有骰子。出于好奇，这就是它产生的结果：

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
            # process any FOO=bar switches

#$FS = ' ';     # set field separator
$, = ' ';       # set output field separator
$\ = "\n";      # set output record separator

$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";

while (<>) {
    chomp;  # strip record separator
    if (!/CHUPACABRA/) {
    print $_; 
   }   
}

这必须在某人的 Windows 机器下运行，否则我会坚持使用 awk。

谢谢！

Bubnoff

编辑（已解决）**

谢谢暴民！这是一个（工作的）perl 脚本版本（调整后的 a2p 输出）：

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
            # process any FOO=bar switches

#$FS = ' ';     # set field separator
$, = ' ';       # set output field separator
$\ = "\n";      # set output record separator

$/ = "\n"x10;
$FS = "\n";

while (<>) {
    chomp;  # strip record separator
    if (!/CHUPACABRA/) {
    print $_; 
    }   
}

请随意发布改进或 CPAN 好东西，使这更惯用和/或 perl 风格。谢谢！

原文

I have a directory full of files containing records like:

FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx


                                                                      01/26/2011
     These items are being held for you at the location shown below each one.
     IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.

     The Waltons. The complete  DAXXXX12118198
     Pickup at:CHUPACABRA LOCATION                                 02/02/2011







                                                  GRIMLY, WILFORD
                                                  29 FAKE LANE
                                                  S. BLEMPGLORFF RI  99XXX

I need to remove all entries with the expression Pickup at:CHUPACABRA LOCATION.

The "record separator" issue:
I can't touch the input file's formatting -- it must be retained as is. Each record
is separated by roughly 40+ new lines.

Here's some awk ( this works ):

BEGIN { 
    RS="\n\n\n\n\n\n\n\n\n+" 
    FS="\n"
}
!/CHUPACABRA/{print $0}

My stab with perl:

perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";chomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000

Nothing is returned. I'm not sure how to specify 'field separator' in perl except at the commandline. Tried the a2p utility -- no dice. For the curious, here's what it produces:

eval '
This has to run under someone's Windows box otherwise I'd stick with awk. 
Thanks!
Bubnoff
 EDIT ( SOLVED ) **
Thanks mob!

Here's a ( working ) perl script version ( adjusted a2p output ):
eval '
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
            # process any FOO=bar switches

#$FS = ' ';     # set field separator
$, = ' ';       # set output field separator
$\ = "\n";      # set output record separator

$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";

while (<>) {
    chomp;  # strip record separator
    if (!/CHUPACABRA/) {
    print $_; 
   }   
}

This has to run under someone's Windows box otherwise I'd stick with awk. 
Thanks!
Bubnoff
 EDIT ( SOLVED ) **
Thanks mob!

Here's a ( working ) perl script version ( adjusted a2p output ):

Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
            # process any FOO=bar switches

#$FS = ' ';     # set field separator
$, = ' ';       # set output field separator
$\ = "\n";      # set output record separator

$/ = "\n"x10;
$FS = "\n";

while (<>) {
    chomp;  # strip record separator
    if (!/CHUPACABRA/) {
    print $_; 
    }   
}

Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!

.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z # process any FOO=bar switches #$FS = ' '; # set field separator $, = ' '; # set output field separator $\ = "\n"; # set output record separator $/ = "\n\n\n\n\n\n\n\n\n+"; $FS = "\n"; while (<>) { chomp; # strip record separator if (!/CHUPACABRA/) { print $_; } }

This has to run under someone's Windows box otherwise I'd stick with awk.

Thanks!

Bubnoff

EDIT ( SOLVED ) **

Thanks mob!
Here's a ( working ) perl script version ( adjusted a2p output ):

Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

牵你手 2024-10-21 21:48:44

在 Perl 中，记录分隔符是文字字符串，而不是正则表达式。正如 perlvar 文档所说：

记住：$/ 的值是一个字符串，而不是正则表达式。 awk 必须在某些方面做得更好。 :-)

不过，看起来你可以使用 $/="\n" x 10 或类似的东西：

perl -a -F\n -ne '$/="\n"x10;$\="\n";chomp;$regex="CHUPACABRA";
       print if /\S/ && !m/$regex/i;' data/lib51.000

注意额外的 /\S/ &&，它将跳过具有超过 20 个连续换行符的输入中的空段落。

另外，您是否考虑过仅安装 Cygwin 并在 Windows 计算机上使用 awk ？

In Perl, the record separator is a literal string, not a regular expression. As the perlvar doc famously says:

Remember: the value of $/ is a string, not a regex. awk has to be better for something. :-)

Still, it looks like you can get away with $/="\n" x 10 or something like that:

perl -a -F\n -ne '$/="\n"x10;$\="\n";chomp;$regex="CHUPACABRA";
       print if /\S/ && !m/$regex/i;' data/lib51.000

Note the extra /\S/ &&, which will skip empty paragraphs from input that has more than 20 consecutive newlines.

Also, have you considered just installing Cygwin and having awk available on your Windows machine?

回复收藏 0 原文

请远离我 2024-10-21 21:48:44

如果您可以下载gawk for windows，则不需要（太多）转换

回复收藏 0 原文

寒冷纷飞旳雪 2024-10-21 21:48:44

你知道吗，Perl 附带了一个名为 a2p 的程序，它完全可以完成你所描述的你想要做的事情在你的标题中？

而且，如果您的计算机上有 Perl，则该程序的文档已经存在：

C> perldoc a2p

我自己的建议是获取 Llama无论如何，请阅读并学习 Perl。不管 Python 人怎么说，Perl 是一种伟大而灵活的语言。如果您了解 shell、awk 和 grep，那么您将毫无问题地理解许多 Perl 结构。

Did you know that Perl comes with a program called a2p that does exactly what you described you want to do in your title?

And, if you have Perl on your machine, the documentation for this program is already there:

C> perldoc a2p

My own suggestion is to get the Llama book and learn Perl anyway. Despite what the Python people say, Perl is a great and flexible language. If you know shell, awk and grep, you'll understand many of the Perl constructs without any problems.

回复收藏 0 原文

~没有更多了~