解析perl正则表达式中的多行并提取值

发布于 2024-10-08 02:29:30 字数 590 浏览 8 评论 0原文

我是 Perl 的初学者。我有一个文本文件,其文本类似于如下所示。我需要提取 VALUE="<NEEDED VALUE>"。就菠菜而言,我应该单独吃沙拉。

如何使用 perl regex 获取值。我需要解析多行才能得到它。即每个 #ifonly --- #endifonly 之间

$cat check.txt

while (<$file>)
{
   if (m/#ifonly .+ SPINACH .+ VALUE=(")([\w]*)(") .+ #endifonly/g)
{
    my $chosen = $2;
   }
}


#ifonly APPLE CARROT SPINACH
VALUE="SALAD" REQUIRED="yes" 
QW RETEWRT OIOUR
#endifonly
#ifonly APPLE MANGO ORANGE CARROT
VALUE="JUICE" REQUIRED="yes" 
as df fg
#endifonly

I am a beginner in perl. I have a text file with text similar to as below. i need to extract VALUE="<NEEDED VALUE>". Say for SPINACH, i should be getting SALAD alone.

How to use perl regex to get the value. i need to parse multiple lines to get it. ie between each #ifonly --- #endifonly

$cat check.txt

while (<$file>)
{
   if (m/#ifonly .+ SPINACH .+ VALUE=(")([\w]*)(") .+ #endifonly/g)
{
    my $chosen = $2;
   }
}

#ifonly APPLE CARROT SPINACH
VALUE="SALAD" REQUIRED="yes" 
QW RETEWRT OIOUR
#endifonly
#ifonly APPLE MANGO ORANGE CARROT
VALUE="JUICE" REQUIRED="yes" 
as df fg
#endifonly

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

一生独一 2024-10-15 02:29:30
use strict;
use warnings;
use 5.010;

while (<DATA>) {
   my $rc = /#ifonly .+ SPINACH/ .. (my ($value) = /VALUE="([^"]*)"/);
   next unless $rc =~ /E0$/;
   say $value;
}

__DATA__
#ifonly APPLE CARROT SPINACH
VALUE="SALAD" REQUIRED="yes" 
QW RETEWRT OIOUR
#endifonly
#ifonly APPLE MANGO ORANGE CARROT
VALUE="JUICE" REQUIRED="yes" 
as df fg
#endifonly

这使用了 brian d foy 此处。正如链接所述,它使用标量范围运算符/触发器

use strict;
use warnings;
use 5.010;

while (<DATA>) {
   my $rc = /#ifonly .+ SPINACH/ .. (my ($value) = /VALUE="([^"]*)"/);
   next unless $rc =~ /E0$/;
   say $value;
}

__DATA__
#ifonly APPLE CARROT SPINACH
VALUE="SALAD" REQUIRED="yes" 
QW RETEWRT OIOUR
#endifonly
#ifonly APPLE MANGO ORANGE CARROT
VALUE="JUICE" REQUIRED="yes" 
as df fg
#endifonly

This uses a small trick described by brian d foy here. As the link describes, it uses the scalar range operator / flipflop.

恋竹姑娘 2024-10-15 02:29:30

如果您的文件非常大(或者由于其他原因您想逐行读取它),您可以按如下方式执行:

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

my ($file, $keyword);

# now get command line options (see Usage note below)
GetOptions(
            "f=s" => \$file,
            "k=s" => \$keyword,
          );

# if either the file or the keyword has not been provided, display a
# help text and exit
if (! $file || ! $keyword) {
   print STDERR<<EOF;

   Usage: script.pl -f filename -k keyword

EOF
   exit(1);
}

my $found;         # indicator that the keyword has been found
my $returned_word; # will store the word you want to retrieve

open FILE, "<$file" or die "Cannot open file '$file': $!";
while (<FILE>) {
   if (/$keyword/) {
      $found = 1;
   }

   # the following condition will be true between all lines that
   # start with '#ifonly' or '#endifonly' - but only if the keyword 
   # has been found!
   if (/^#ifonly/ .. /^#endifonly/ && $found) {
      if (/VALUE="(\w+)"/) { 
         $returned_word = $1;
         print "looking for $keyword --> found $returned_word\n";

         last; # if you want to get ALL values after the keyword
               # remove the 'last' statement, as it makes the script
               # exit the while loop
      }
   }
}
close FILE;

In case your file is very big (or you want to read it line by line for some other reason) you could do it as follows:

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

my ($file, $keyword);

# now get command line options (see Usage note below)
GetOptions(
            "f=s" => \$file,
            "k=s" => \$keyword,
          );

# if either the file or the keyword has not been provided, display a
# help text and exit
if (! $file || ! $keyword) {
   print STDERR<<EOF;

   Usage: script.pl -f filename -k keyword

EOF
   exit(1);
}

my $found;         # indicator that the keyword has been found
my $returned_word; # will store the word you want to retrieve

open FILE, "<$file" or die "Cannot open file '$file': $!";
while (<FILE>) {
   if (/$keyword/) {
      $found = 1;
   }

   # the following condition will be true between all lines that
   # start with '#ifonly' or '#endifonly' - but only if the keyword 
   # has been found!
   if (/^#ifonly/ .. /^#endifonly/ && $found) {
      if (/VALUE="(\w+)"/) { 
         $returned_word = $1;
         print "looking for $keyword --> found $returned_word\n";

         last; # if you want to get ALL values after the keyword
               # remove the 'last' statement, as it makes the script
               # exit the while loop
      }
   }
}
close FILE;
倾城°AllureLove 2024-10-15 02:29:30

您可以读取字符串中的文件内容,然后搜索字符串中的模式:

my $file;    
$file.=$_ while(<>);    
if($file =~ /#ifonly.+?\bSPINACH\b.+?VALUE="(\w*)".+?#endifonly/s) {
        print $1;
}

您的原始正则表达式需要一些调整:

  • 您需要制作量词
    非贪婪。
  • 使用 s 修饰符来创建 .
    也匹配换行符。

Ideone 链接

You can read the file contents in a string and then search for the pattern in the string:

my $file;    
$file.=$_ while(<>);    
if($file =~ /#ifonly.+?\bSPINACH\b.+?VALUE="(\w*)".+?#endifonly/s) {
        print $1;
}

Your original regex needs some tweaking:

  • You need to make your quantifiers
    non-greedy.
  • Use the s modifier to make .
    match newline as-well.

Ideone Link

初见终念 2024-10-15 02:29:30

这是基于触发器运算符的另一个答案:

use strict;
use warnings;
use 5.010;

while (<$file>)
{
  if ( (/^#ifonly.*\bSPINACH\b/ .. /^#endifonly/) &&
       (my ($chosen) = /^VALUE="(\w+)"/) )
  {
    say $chosen;
  }
}

此解决方案将第二个测试应用于范围内的所有行。 @Hugmeir 用于排除开始行和结束行的技巧是不需要的,因为“内部”正则表达式 /^VALUE="(\w+)"/ 无论如何都无法匹配它们(我添加了所有正则表达式的 ^ 锚点以双重确保这一点)。

Here's another answer based on the flip-flop operator:

use strict;
use warnings;
use 5.010;

while (<$file>)
{
  if ( (/^#ifonly.*\bSPINACH\b/ .. /^#endifonly/) &&
       (my ($chosen) = /^VALUE="(\w+)"/) )
  {
    say $chosen;
  }
}

This solution applies the second test to all of the lines in the range. The trick @Hugmeir used to exclude the start and end lines isn't needed because the "inner" regex, /^VALUE="(\w+)"/, can never match them anyway (I added the ^ anchor to all regexes to make doubly sure of that).

横笛休吹塞上声 2024-10-15 02:29:30

两天前给出的一个答案中的这两行效率

my $file;
$file.=$_ while(<>);

不是很高。 Perl 可能会以大块的形式读取文件,将这些块分成 <> 的文本行,然后 .= 将这些行连接起来以形成一个大字符串。读取文件会更有效。基本样式是更改输入记录分隔符\$

undef $/;
$file = <>;

模块File::Slurp;(参见perldoc File::Slurp)可能会更好。

These two lines in one answer given two days ago

my $file;
$file.=$_ while(<>);

are not very efficient. Perl will likely read the file in big chunks, break those chunks into lines of text for the <> and then the .= will join those lines back to make a big string. It would be more efficient to slurp the file. The basic style is to alter \$ the input record separator.

undef $/;
$file = <>;

The module File::Slurp; (see perldoc File::Slurp) may be even better.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文