如何使用新的line字符或标签字符或空格抓取多行线字符串

发布于 2025-01-21 06:40:52 字数 1192 浏览 4 评论 0原文

我的测试文件具有以下文字:

> cat test.txt
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

我试图匹配所有单行,以semicolon(;)结尾,并带有文本“假人(”。然后,我需要提取虚拟内部双引号中存在的字符串。我已经提出了命令,但它仅与-o标志

> perl -ne 'print if /dummy/ .. /;/' test.txt | grep -oP 'dummy\((.|\n)*,'
dummy("test1",
dummy("test3",

匹配

相 IS:

test1
test2
test3
test4

如果行包含1个以上的新行字符,则代码中断了

new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");
new dummy("test5",
        random5).foo("bar5");
new dummy("test6", random6).foo(
        "bar6");
new dummy("test7", random7).foo("
        bar7");

下面的某些答案适用于基本文件结构, ://stackoverflow.com/questions/12652568/how-to-to-give-a-pattern-for-new-line-in-grep“>如何在grep中为新行提供一个模式?

My test file has text like:

> cat test.txt
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

I am trying to match all single lines ending with semicolon (;) and having text "dummy(". Then I need to extract the string present in the double quotes inside dummy. I have come up with the following command, but it matches only the first and third statement.

> perl -ne 'print if /dummy/ .. /;/' test.txt | grep -oP 'dummy\((.|\n)*,'
dummy("test1",
dummy("test3",

With -o flag I expected to extract string between the double quotes inside dummy. But that is also not working. Can you please give me an idea on how to proceed?

Expected output is:

test1
test2
test3
test4

Some of the below answers work for basic file structures. If lines contains more than 1 new line characters, then code breaks. e.g. Input text files with more new line characters:

new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");
new dummy("test5",
        random5).foo("bar5");
new dummy("test6", random6).foo(
        "bar6");
new dummy("test7", random7).foo("
        bar7");

I referred to following SO links:

How to give a pattern for new line in grep?

how to grep multiple lines until ; (semicolon)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

夜血缘 2025-01-28 06:40:52

@tlp非常接近:

perl -0777 -nE 'say for map {s/^\s+|\s+$//gr} /\bdummy\(\s*"(.+?)"/gs' test.txt
test1
test2

使用

  • -0777将文件插入单个字符串
  • /\ bdummy \(\ s*“(。+?) “假人”之后的所有引用的字符串内容
    • s flag允许匹配newlines。
    • 任何包含逃脱双引号

  • 映射{s/^\ s+| \ s+$ // gr}从每个字符串中领导/尾随的whitespace。

@TLP was pretty close:

perl -0777 -nE 'say for map {s/^\s+|\s+$//gr} /\bdummy\(\s*"(.+?)"/gs' test.txt
test1
test2

Using

  • -0777 to slurp the file in as a single string
  • /\bdummy\(\s*"(.+?)"/gs finds all the quoted string content after "dummy(" (with optional whitespace before the opening quote)
    • the s flag allows . to match newlines.
    • any string containing escaped double quotes will break this regex
  • map {s/^\s+|\s+$//gr} trims leading/trailing whitespace from each string.
姜生凉生 2025-01-28 06:40:52

perl应该有效:

perl -0777 -pe 's/(?m)^[^(]* dummy\(\s*"\s*([^"]+).*/$1/g' file

test1
test2
test3
test4

遵循gnu-grep + tr也应该有效:

grep -zoP '[^(]* dummy\(\s*"\s*\K[^"]+"' file | tr '"' '\n'

test1
test2
test3
test4

This perl should work:

perl -0777 -pe 's/(?m)^[^(]* dummy\(\s*"\s*([^"]+).*/$1/g' file

test1
test2
test3
test4

Following gnu-grep + tr should also work:

grep -zoP '[^(]* dummy\(\s*"\s*\K[^"]+"' file | tr '"' '\n'

test1
test2
test3
test4
独自←快乐 2025-01-28 06:40:52

在您显示的样本的情况下,请尝试以GNU AWK编写和测试。

awk -v RS='(^|\n)new[^;]*;' '
RT{
  rt=RT
  gsub(/\n+|[[:space:]]+/,"",rt)
  match(rt,/"[^"]*"/)
  print substr(rt,RSTART+1,RLENGTH-2)
}
'  Input_file

With your shown samples, please try following awk code, written and tested in GNU awk.

awk -v RS='(^|\n)new[^;]*;' '
RT{
  rt=RT
  gsub(/\n+|[[:space:]]+/,"",rt)
  match(rt,/"[^"]*"/)
  print substr(rt,RSTART+1,RLENGTH-2)
}
'  Input_file
云淡风轻 2025-01-28 06:40:52

您可以使用 text :: parsewords 提取引用的字段。

use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;

my $str = do {
    local $/;
    <DATA>;
};   # slurp the text into a variable
my @lines = quotewords(q("), 1, $str);   # extract fields
my @txt;

for (0 .. $#lines) {
    if ($lines[$_] =~ /\bdummy\s*\(/) {
        push @txt, $lines[$_+1];         # target text will be in fields following "dummy("
    }
}

s/^\s+|\s+$//g for @txt;     # trim leading/trailing whitespace
print Dumper \@txt;

__DATA__
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

输出:

$VAR1 = [
          'test1',
          'test2',
          'test3',
          'test4'
        ];

You can use Text::ParseWords to extract the quoted fields.

use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;

my $str = do {
    local $/;
    <DATA>;
};   # slurp the text into a variable
my @lines = quotewords(q("), 1, $str);   # extract fields
my @txt;

for (0 .. $#lines) {
    if ($lines[$_] =~ /\bdummy\s*\(/) {
        push @txt, $lines[$_+1];         # target text will be in fields following "dummy("
    }
}

s/^\s+|\s+$//g for @txt;     # trim leading/trailing whitespace
print Dumper \@txt;

__DATA__
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

Output:

$VAR1 = [
          'test1',
          'test2',
          'test3',
          'test4'
        ];
请持续率性 2025-01-28 06:40:52

给定:

$ cat file
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

您可以这样使用GNU GREP:

$ grep -ozP '[^;]*\bdummy[^";]*"\s*\K[^";]*[^;]*;' file | tr '\000' '\n' | grep -oP '^[^"]*'
test1
test2
test3
test4

更健壮,如果这是;界限文本,则可以:

  1. ;上拆分;
  2. /\ bdummy \ b/的过滤器;
  3. 用引号抓住第一个字段;
  4. 剥离空格。

这就是Ruby中的所有内容:

ruby -e 'puts 
lt;.read.split(/(?<=;)/).
                select{|b| b[/\bdummy\b/]}.
                map{|s| s[/(?<=")[^"]*/].strip}' file 
# same output

Given:

$ cat file
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

You can use GNU grep this way:

$ grep -ozP '[^;]*\bdummy[^";]*"\s*\K[^";]*[^;]*;' file | tr '\000' '\n' | grep -oP '^[^"]*'
test1
test2
test3
test4

Somewhat more robust, if this is a ; delimited text, you can:

  1. split on the ;;
  2. filter for /\bdummy\b/;
  3. grab the first field in quotes;
  4. strip the whitespace.

Here is all that in a ruby:

ruby -e 'puts 
lt;.read.split(/(?<=;)/).
                select{|b| b[/\bdummy\b/]}.
                map{|s| s[/(?<=")[^"]*/].strip}' file 
# same output
橘虞初梦 2025-01-28 06:40:52

awk基于fs

<test1.txt gawk -b -e 'BEGIN { RS="^$"

 FS="((^|\\n)?"(___="[^\\n")"]+y[(]"(_="[ \\t\\n]*")(__="[\\42]")(_)\
    "|"(_="[ \\t]*")(__)(_)"[,]"(___)";]+[;][\\n])+"} sub(OFS=ORS,"",$!--NF)'          

test1
test2
test3
test4

gawk200万行5.15 secs上进行基准测试,因此,除非您的输入文件超出100 MB,否则这就足够了。

***警告:避免使用此解决方案使用mawk-1.9.9.6

awk-based solution handling everything via FS :

<test1.txt gawk -b -e 'BEGIN { RS="^
quot;

 FS="((^|\\n)?"(___="[^\\n")"]+y[(]"(_="[ \\t\\n]*")(__="[\\42]")(_)\
    "|"(_="[ \\t]*")(__)(_)"[,]"(___)";]+[;][\\n])+"} sub(OFS=ORS,"",$!--NF)'          

test1
test2
test3
test4

gawk was benchmarked at 2 million rows at 5.15 secs, so unless your input file is beyond 100 MB, this suffices.

*** caveat : avoid using mawk-1.9.9.6 with this solution

妄断弥空 2025-01-28 06:40:52

建议简单gawk脚本(标准Linux awk):

 awk '/dummy/{print gensub("[[:space:]]*","",1,$2)}' RS=';' FS='"'  input.txt

说明:

rs =';' set awk 记录分隔符到;

fs ='“' set awk fields saparator to

/dummy/过滤器仅记录匹配虚拟 rexexp

gensub(“ [[:space:]]*”,“”,1,$ 2)字段

打印Gensub(“ [[:SPACE:]]*”,“”,1,$ 2)打印修剪第二个字段

Suggesting simple gawk script (standard linux awk):

 awk '/dummy/{print gensub("[[:space:]]*","",1,$2)}' RS=';' FS='"'  input.txt

Explanation:

RS=';' Set awk records separator to ;

FS='"' Set awk fields separator to "

/dummy/ Filter only records matchingdummy RexExp

gensub("[[:space:]]*","",1,$2) Trim any white-spaces from the beginning of 2nd field

print gensub("[[:space:]]*","",1,$2) print trimmed 2nd field

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文