如何使用新的line字符或标签字符或空格抓取多行线字符串
我的测试文件具有以下文字:
> cat test.txt
new dummy("test1", random1).foo("bar1");
new dummy("
test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
"test4", random4).foo("bar4");
我试图匹配所有单行,以semicolon(;)结尾,并带有文本“假人(”。然后,我需要提取虚拟内部双引号中存在的字符串。我已经提出了命令,但它仅与-o标志
> perl -ne 'print if /dummy/ .. /;/' test.txt | grep -oP 'dummy\((.|\n)*,'
dummy("test1",
dummy("test3",
匹配
相 IS:
test1
test2
test3
test4
如果行包含1个以上的新行字符,则代码中断了
new dummy("test1", random1).foo("bar1");
new dummy("
test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
"test4", random4).foo("bar4");
new dummy("test5",
random5).foo("bar5");
new dummy("test6", random6).foo(
"bar6");
new dummy("test7", random7).foo("
bar7");
。
下面的某些答案适用于基本文件结构, ://stackoverflow.com/questions/12652568/how-to-to-give-a-pattern-for-new-line-in-grep“>如何在grep中为新行提供一个模式?
My test file has text like:
> cat test.txt
new dummy("test1", random1).foo("bar1");
new dummy("
test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
"test4", random4).foo("bar4");
I am trying to match all single lines ending with semicolon (;) and having text "dummy(". Then I need to extract the string present in the double quotes inside dummy. I have come up with the following command, but it matches only the first and third statement.
> perl -ne 'print if /dummy/ .. /;/' test.txt | grep -oP 'dummy\((.|\n)*,'
dummy("test1",
dummy("test3",
With -o flag I expected to extract string between the double quotes inside dummy. But that is also not working. Can you please give me an idea on how to proceed?
Expected output is:
test1
test2
test3
test4
Some of the below answers work for basic file structures. If lines contains more than 1 new line characters, then code breaks. e.g. Input text files with more new line characters:
new dummy("test1", random1).foo("bar1");
new dummy("
test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
"test4", random4).foo("bar4");
new dummy("test5",
random5).foo("bar5");
new dummy("test6", random6).foo(
"bar6");
new dummy("test7", random7).foo("
bar7");
I referred to following SO links:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
@tlp非常接近:
使用
-0777
将文件插入单个字符串/\ bdummy \(\ s*“(。+?) “假人”之后的所有引用的字符串内容- 任何包含逃脱双引号
s
flag允许。
匹配newlines。映射{s/^\ s+| \ s+$ // gr}
从每个字符串中领导/尾随的whitespace。@TLP was pretty close:
Using
-0777
to slurp the file in as a single string/\bdummy\(\s*"(.+?)"/gs
finds all the quoted string content after "dummy(" (with optional whitespace before the opening quote)s
flag allows.
to match newlines.map {s/^\s+|\s+$//gr}
trims leading/trailing whitespace from each string.此
perl
应该有效:遵循
gnu-grep + tr
也应该有效:This
perl
should work:Following
gnu-grep + tr
should also work:在您显示的样本的情况下,请尝试以GNU
AWK
编写和测试。With your shown samples, please try following
awk
code, written and tested in GNUawk
.您可以使用
text :: parsewords
提取引用的字段。输出:
You can use
Text::ParseWords
to extract the quoted fields.Output:
给定:
您可以这样使用GNU GREP:
更健壮,如果这是
;
界限文本,则可以:;
上拆分;/\ bdummy \ b/
的过滤器;这就是
Ruby
中的所有内容:Given:
You can use GNU grep this way:
Somewhat more robust, if this is a
;
delimited text, you can:;
;/\bdummy\b/
;Here is all that in a
ruby
:awk
基于fs
:gawk
在200万行
在5.15 secs上进行基准测试
,因此,除非您的输入文件超出100 MB
,否则这就足够了。***警告:避免使用此解决方案使用
mawk-1.9.9.6
awk
-based solution handling everything viaFS
:gawk
was benchmarked at2 million rows
at5.15 secs
, so unless your input file is beyond100 MB
, this suffices.*** caveat : avoid using
mawk-1.9.9.6
with this solution建议简单
gawk
脚本(标准Linuxawk
):说明:
rs =';'
set awk 记录分隔符到;
fs ='“'
setawk
fields saparator to”
/dummy/
过滤器仅记录匹配虚拟
rexexpgensub(“ [[:space:]]*”,“”,1,$ 2)字段
打印Gensub(“ [[:SPACE:]]*”,“”,1,$ 2)
打印修剪第二个字段Suggesting simple
gawk
script (standard linuxawk
):Explanation:
RS=';'
Setawk
records separator to;
FS='"'
Setawk
fields separator to"
/dummy/
Filter only records matchingdummy
RexExpgensub("[[:space:]]*","",1,$2)
Trim any white-spaces from the beginning of 2nd fieldprint gensub("[[:space:]]*","",1,$2)
print trimmed 2nd field