awk 到 perl 的转换
我有一个充满文件的目录,其中包含如下记录:
FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx
01/26/2011
These items are being held for you at the location shown below each one.
IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.
The Waltons. The complete DAXXXX12118198
Pickup at:CHUPACABRA LOCATION 02/02/2011
GRIMLY, WILFORD
29 FAKE LANE
S. BLEMPGLORFF RI 99XXX
我需要删除带有表达式 Pickup at:CHUPACABRA LOCATION
的所有条目。
“记录分隔符”问题: 我无法修改输入文件的格式 - 它必须按原样保留。每条记录 由大约 40 多行新行分隔。
这是一些 awk (这有效):
BEGIN {
RS="\n\n\n\n\n\n\n\n\n+"
FS="\n"
}
!/CHUPACABRA/{print $0}
我对 perl 的尝试:
perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";chomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000
没有返回任何内容。我不知道如何在 perl 中指定“字段分隔符”,除了在命令行中。尝试了 a2p 实用程序 - 没有骰子。出于好奇,这就是它产生的结果:
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
这必须在某人的 Windows 机器下运行,否则我会坚持使用 awk。
谢谢!
Bubnoff
编辑(已解决)**
谢谢暴民! 这是一个(工作的)perl 脚本版本(调整后的 a2p 输出):
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n"x10;
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
请随意发布改进或 CPAN 好东西,使这更惯用和/或 perl 风格。谢谢!
I have a directory full of files containing records like:
FAKE ORGANIZATION
799 S FAKE AVE
Northern Blempglorff, RI 99xxx
01/26/2011
These items are being held for you at the location shown below each one.
IF YOU ASKED THAT MATERIAL BE MAILED TO YOU, PLEASE DISREGARD THIS NOTICE.
The Waltons. The complete DAXXXX12118198
Pickup at:CHUPACABRA LOCATION 02/02/2011
GRIMLY, WILFORD
29 FAKE LANE
S. BLEMPGLORFF RI 99XXX
I need to remove all entries with the expression Pickup at:CHUPACABRA LOCATION
.
The "record separator" issue:
I can't touch the input file's formatting -- it must be retained as is. Each record
is separated by roughly 40+ new lines.
Here's some awk ( this works ):
BEGIN {
RS="\n\n\n\n\n\n\n\n\n+"
FS="\n"
}
!/CHUPACABRA/{print $0}
My stab with perl:
perl -a -F\n -ne '$/ = "\n\n\n\n\n\n\n\n\n+";$\ = "\n";chomp;$regex="CHUPACABRA";print $_ if $_ !~ m/$regex/i;' data/lib51.000
Nothing is returned. I'm not sure how to specify 'field separator' in perl except at the commandline. Tried the a2p utility -- no dice. For the curious, here's what it produces:
eval '
This has to run under someone's Windows box otherwise I'd stick with awk.
Thanks!
Bubnoff
EDIT ( SOLVED ) **
Thanks mob!
Here's a ( working ) perl script version ( adjusted a2p output ):
eval '
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n\n\n\n\n\n\n\n\n+";
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
This has to run under someone's Windows box otherwise I'd stick with awk.
Thanks!
Bubnoff
EDIT ( SOLVED ) **
Thanks mob!
Here's a ( working ) perl script version ( adjusted a2p output ):
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z
# process any FOO=bar switches
#$FS = ' '; # set field separator
$, = ' '; # set output field separator
$\ = "\n"; # set output record separator
$/ = "\n"x10;
$FS = "\n";
while (<>) {
chomp; # strip record separator
if (!/CHUPACABRA/) {
print $_;
}
}
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z # process any FOO=bar switches #$FS = ' '; # set field separator $, = ' '; # set output field separator $\ = "\n"; # set output record separator $/ = "\n\n\n\n\n\n\n\n\n+"; $FS = "\n"; while (<>) { chomp; # strip record separator if (!/CHUPACABRA/) { print $_; } }This has to run under someone's Windows box otherwise I'd stick with awk.
Thanks!
Bubnoff
EDIT ( SOLVED ) **
Thanks mob!
Here's a ( working ) perl script version ( adjusted a2p output ):
Feel free to post improvements or CPAN goodies that make this more idiomatic and/or perl-ish. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 Perl 中,记录分隔符是文字字符串,而不是正则表达式。正如
perlvar
文档 所说:不过,看起来你可以使用
$/="\n" x 10
或类似的东西:注意额外的
/\S/ &&,它将跳过具有超过 20 个连续换行符的输入中的空段落。
另外,您是否考虑过仅安装 Cygwin 并在 Windows 计算机上使用
awk
?In Perl, the record separator is a literal string, not a regular expression. As the
perlvar
doc famously says:Still, it looks like you can get away with
$/="\n" x 10
or something like that:Note the extra
/\S/ &&
, which will skip empty paragraphs from input that has more than 20 consecutive newlines.Also, have you considered just installing Cygwin and having
awk
available on your Windows machine?如果您可以下载gawk for windows,则不需要(太多)转换
There is no need for (much)conversion if you can download gawk for windows
你知道吗,Perl 附带了一个名为 a2p 的程序,它完全可以完成你所描述的你想要做的事情在你的标题中?
而且,如果您的计算机上有 Perl,则该程序的文档已经存在:
我自己的建议是获取 Llama无论如何,请阅读并学习 Perl。不管 Python 人怎么说,Perl 是一种伟大而灵活的语言。如果您了解 shell、awk 和 grep,那么您将毫无问题地理解许多 Perl 结构。
Did you know that Perl comes with a program called a2p that does exactly what you described you want to do in your title?
And, if you have Perl on your machine, the documentation for this program is already there:
My own suggestion is to get the Llama book and learn Perl anyway. Despite what the Python people say, Perl is a great and flexible language. If you know shell, awk and grep, you'll understand many of the Perl constructs without any problems.