如何使用新的line字符或标签字符或空格抓取多行线字符串

发布于 2025-01-21 06:40:52 字数 1192 浏览 4 评论 0原文

我的测试文件具有以下文字：

> cat test.txt
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

我试图匹配所有单行，以semicolon（;）结尾，并带有文本“假人（”。然后，我需要提取虚拟内部双引号中存在的字符串。我已经提出了命令，但它仅与-o标志

> perl -ne 'print if /dummy/ .. /;/' test.txt | grep -oP 'dummy\((.|\n)*,'
dummy("test1",
dummy("test3",

匹配

相 IS：

test1
test2
test3
test4

如果行包含1个以上的新行字符，则代码中断了

new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");
new dummy("test5",
        random5).foo("bar5");
new dummy("test6", random6).foo(
        "bar6");
new dummy("test7", random7).foo("
        bar7");

。

下面的某些答案适用于基本文件结构，：//stackoverflow.com/questions/12652568/how-to-to-give-a-pattern-for-new-line-in-grep“>如何在grep中为新行提供一个模式？

原文

My test file has text like:

> cat test.txt
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

I am trying to match all single lines ending with semicolon (;) and having text "dummy(". Then I need to extract the string present in the double quotes inside dummy. I have come up with the following command, but it matches only the first and third statement.

> perl -ne 'print if /dummy/ .. /;/' test.txt | grep -oP 'dummy\((.|\n)*,'
dummy("test1",
dummy("test3",

With -o flag I expected to extract string between the double quotes inside dummy. But that is also not working. Can you please give me an idea on how to proceed?

Expected output is:

test1
test2
test3
test4

Some of the below answers work for basic file structures. If lines contains more than 1 new line characters, then code breaks. e.g. Input text files with more new line characters:

new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");
new dummy("test5",
        random5).foo("bar5");
new dummy("test6", random6).foo(
        "bar6");
new dummy("test7", random7).foo("
        bar7");

I referred to following SO links:

How to give a pattern for new line in grep?

how to grep multiple lines until ; (semicolon)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜血缘 2025-01-28 06:40:52

@tlp非常接近：

perl -0777 -nE 'say for map {s/^\s+|\s+$//gr} /\bdummy\(\s*"(.+?)"/gs' test.txt

test1
test2

使用

-0777将文件插入单个字符串
/\ bdummy \（\ s*“（。+？） “假人”之后的所有引用的字符串内容s flag允许。匹配newlines。任何包含逃脱双引号
映射{s/^\ s+| \ s+$ // gr}从每个字符串中领导/尾随的whitespace。

@TLP was pretty close:

perl -0777 -nE 'say for map {s/^\s+|\s+$//gr} /\bdummy\(\s*"(.+?)"/gs' test.txt

test1
test2

Using

-0777 to slurp the file in as a single string
/\bdummy\(\s*"(.+?)"/gs finds all the quoted string content after "dummy(" (with optional whitespace before the opening quote)
- the s flag allows . to match newlines.
- any string containing escaped double quotes will break this regex
map {s/^\s+|\s+$//gr} trims leading/trailing whitespace from each string.

回复收藏 0 原文

姜生凉生 2025-01-28 06:40:52

此perl应该有效：

perl -0777 -pe 's/(?m)^[^(]* dummy\(\s*"\s*([^"]+).*/$1/g' file

test1
test2
test3
test4

遵循gnu-grep + tr也应该有效：

grep -zoP '[^(]* dummy\(\s*"\s*\K[^"]+"' file | tr '"' '\n'

test1
test2
test3
test4

This perl should work:

perl -0777 -pe 's/(?m)^[^(]* dummy\(\s*"\s*([^"]+).*/$1/g' file

test1
test2
test3
test4

Following gnu-grep + tr should also work:

grep -zoP '[^(]* dummy\(\s*"\s*\K[^"]+"' file | tr '"' '\n'

test1
test2
test3
test4

回复收藏 0 原文

独自←快乐 2025-01-28 06:40:52

在您显示的样本的情况下，请尝试以GNU AWK编写和测试。

awk -v RS='(^|\n)new[^;]*;' '
RT{
  rt=RT
  gsub(/\n+|[[:space:]]+/,"",rt)
  match(rt,/"[^"]*"/)
  print substr(rt,RSTART+1,RLENGTH-2)
}
'  Input_file

With your shown samples, please try following awk code, written and tested in GNU awk.

awk -v RS='(^|\n)new[^;]*;' '
RT{
  rt=RT
  gsub(/\n+|[[:space:]]+/,"",rt)
  match(rt,/"[^"]*"/)
  print substr(rt,RSTART+1,RLENGTH-2)
}
'  Input_file

回复收藏 0 原文

云淡风轻 2025-01-28 06:40:52

您可以使用 text :: parsewords 提取引用的字段。

use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;

my $str = do {
    local $/;
    <DATA>;
};   # slurp the text into a variable
my @lines = quotewords(q("), 1, $str);   # extract fields
my @txt;

for (0 .. $#lines) {
    if ($lines[$_] =~ /\bdummy\s*\(/) {
        push @txt, $lines[$_+1];         # target text will be in fields following "dummy("
    }
}

s/^\s+|\s+$//g for @txt;     # trim leading/trailing whitespace
print Dumper \@txt;

__DATA__
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

输出：

$VAR1 = [
          'test1',
          'test2',
          'test3',
          'test4'
        ];

You can use Text::ParseWords to extract the quoted fields.

use strict;
use warnings;
use Data::Dumper;
use Text::ParseWords;

my $str = do {
    local $/;
    <DATA>;
};   # slurp the text into a variable
my @lines = quotewords(q("), 1, $str);   # extract fields
my @txt;

for (0 .. $#lines) {
    if ($lines[$_] =~ /\bdummy\s*\(/) {
        push @txt, $lines[$_+1];         # target text will be in fields following "dummy("
    }
}

s/^\s+|\s+$//g for @txt;     # trim leading/trailing whitespace
print Dumper \@txt;

__DATA__
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

Output:

$VAR1 = [
          'test1',
          'test2',
          'test3',
          'test4'
        ];

回复收藏 0 原文

请持续率性 2025-01-28 06:40:52

给定：

$ cat file
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

您可以这样使用GNU GREP：

$ grep -ozP '[^;]*\bdummy[^";]*"\s*\K[^";]*[^;]*;' file | tr '\000' '\n' | grep -oP '^[^"]*'
test1
test2
test3
test4

更健壮，如果这是;界限文本，则可以：

在;上拆分;
/\ bdummy \ b/的过滤器;
用引号抓住第一个字段；
剥离空格。

这就是Ruby中的所有内容：

ruby -e 'puts lt;.read.split(/(?<=;)/).
                select{|b| b[/\bdummy\b/]}.
                map{|s| s[/(?<=")[^"]*/].strip}' file 
# same output

Given:

$ cat file
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

You can use GNU grep this way:

$ grep -ozP '[^;]*\bdummy[^";]*"\s*\K[^";]*[^;]*;' file | tr '\000' '\n' | grep -oP '^[^"]*'
test1
test2
test3
test4

Somewhat more robust, if this is a ; delimited text, you can:

split on the ;;
filter for /\bdummy\b/;
grab the first field in quotes;
strip the whitespace.

Here is all that in a ruby:

ruby -e 'puts lt;.read.split(/(?<=;)/).
                select{|b| b[/\bdummy\b/]}.
                map{|s| s[/(?<=")[^"]*/].strip}' file 
# same output

回复收藏 0 原文

橘虞初梦 2025-01-28 06:40:52

awk基于fs：

<test1.txt gawk -b -e 'BEGIN { RS="^$"

 FS="((^|\\n)?"(___="[^\\n")"]+y[(]"(_="[ \\t\\n]*")(__="[\\42]")(_)\
    "|"(_="[ \\t]*")(__)(_)"[,]"(___)";]+[;][\\n])+"} sub(OFS=ORS,"",$!--NF)'          

test1
test2
test3
test4

gawk在200万行在5.15 secs上进行基准测试，因此，除非您的输入文件超出100 MB，否则这就足够了。

***警告：避免使用此解决方案使用mawk-1.9.9.6

awk-based solution handling everything via FS :

<test1.txt gawk -b -e 'BEGIN { RS="^quot;

 FS="((^|\\n)?"(___="[^\\n")"]+y[(]"(_="[ \\t\\n]*")(__="[\\42]")(_)\
    "|"(_="[ \\t]*")(__)(_)"[,]"(___)";]+[;][\\n])+"} sub(OFS=ORS,"",$!--NF)'          

test1
test2
test3
test4

gawk was benchmarked at 2 million rows at 5.15 secs, so unless your input file is beyond 100 MB, this suffices.

*** caveat : avoid using mawk-1.9.9.6 with this solution

回复收藏 0 原文

妄断弥空 2025-01-28 06:40:52

建议简单gawk脚本（标准Linux awk）：

 awk '/dummy/{print gensub("[[:space:]]*","",1,$2)}' RS=';' FS='"'  input.txt

说明：

rs =';' set awk 记录分隔符到;

fs ='“' set awk fields saparator to ”

/dummy/过滤器仅记录匹配虚拟 rexexp

gensub（“ [[：space：]]*”，“”，1，$ 2）字段

打印Gensub（“ [[：SPACE：]]*”，“”，1，$ 2）打印修剪第二个字段

Suggesting simple gawk script (standard linux awk):

 awk '/dummy/{print gensub("[[:space:]]*","",1,$2)}' RS=';' FS='"'  input.txt

Explanation:

RS=';' Set awk records separator to ;

FS='"' Set awk fields separator to "

/dummy/ Filter only records matchingdummy RexExp

gensub("[[:space:]]*","",1,$2) Trim any white-spaces from the beginning of 2nd field

print gensub("[[:space:]]*","",1,$2) print trimmed 2nd field

回复收藏 0 原文

~没有更多了~

关于作者

流年已逝

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

如何使用新的line字符或标签字符或空格抓取多行线字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

说明：

Explanation:

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如何使用新的line字符或标签字符或空格抓取多行线字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（7）

说明：

Explanation:

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。