GNU尴尬打印文本之间的两个模式不包括图案
GNU awk 5.1.1
我在下面有一个尴尬表达式,我用来在.txt文档中发现的两个模式之间打印内容,但我对表达不太满意,想要更优雅的建议吗?我不只是尴尬。
我想抓住这些标签之间的所有文本,但不包括标签。
<monDescription> and <endMonDescription>
如果我简单地使用:
awk '/<monDescription>/,/<endMonDescription>/' ~mydocument.txt
它包括&lt; mondescription&gt;
and &lt; endmondScription&gt;
我不想要。 因此,为了解决此问题,我使用GSUB将尴尬输出输送到另一个awk命令:
awk '/<monDescription>/,/<endMonDescription>/' ~mydocument.txt | awk '{gsub(/<monDescription>|<endMonDescription>|DAVE:/, "")}1' | awk '{$1=$1;print}'
然后,我还gsub“ dave:”这是在同一行之前和同一行上发生的文本内容,以及&lt; mondsecription&gt;
我不要。很难仅在模式之前或之后才能在模式之间获得干净的文本,而不包括图案本身而不会倾斜外观管道。建议?
这是输入文本的示例:
DAVE: <monDescription>Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum
sociis natoque penatibus et magnis dis parturient montes, nascetur
ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu,
pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo,
fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo,
rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis
eu pede mollis pretium.<endMonDescription>
预期输出应为:
lorem ipsum dolor sit amet,共销剂脂肪宣传elit。阿尼恩 Commodo Ligula Eget Dolor。 Aenean Massa。社会北约北约北约 Et Magnis distalient Montes,Nascetur嘲笑Mus。 Donec Quam Felis,Ultricies NEC,Pellentesque Eu,Pretium Quis,Sem。 NULLA Reactation Massa Quis Enim。 Donec Pede Justo,Fringilla Vel,等分试样 Nec,vulputate eget,arcu。在Enim Justo,Rhoncus ut,Imperdiet A, Justo Venenatis Vitae。 Nullam Distum Felis Eu Pede Mollis Pretium。
GNU AWK 5.1.1
I have an AWK expression below that I use to print the content between two patterns found in a .txt document but I'm not particularly happy with the expression, would like something more elegant any suggestions? I don't was SED just AWK.
I want to grab all the text between these tags but not including the tags.
<monDescription> and <endMonDescription>
If I simple use:
awk '/<monDescription>/,/<endMonDescription>/' ~mydocument.txt
It includes the <monDescription>
and <endMonDescription>
which I don't want.
So to fix this I pipe the AWK output to another AWK command using gsub:
awk '/<monDescription>/,/<endMonDescription>/' ~mydocument.txt | awk '{gsub(/<monDescription>|<endMonDescription>|DAVE:/, "")}1' | awk '{$1=$1;print}'
Then I also gsub "DAVE: " which is text content that occurs before and on the same line and the <monDescription>
that I don't want. It's tough just to get clean text in-between patterns not before or after the patterns and not including the patterns themselves without slopping looking piping. Suggestions?
Here's a sample of input text:
DAVE: <monDescription>Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum
sociis natoque penatibus et magnis dis parturient montes, nascetur
ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu,
pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo,
fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo,
rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis
eu pede mollis pretium.<endMonDescription>
Expected output should be:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean
commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus
et magnis dis parturient montes, nascetur ridiculus mus. Donec quam
felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla
consequat massa quis enim. Donec pede justo, fringilla vel, aliquet
nec, vulputate eget, arcu. In enim justo, rhoncus ut, imperdiet a,
venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用 awk awk 。
我将在此任务上
我会在此任务以下方式使用
file.txt
内容进行输出说明:我准备了与
&lt; mondescription&gt匹配的正则表达式, ;
和&lt; endmondescription&gt;
(您可以选择使用|
加入的两者,如果我提供的正则表达式会给您的文件提供误报) ,然后我告知GNUawk
将其用作行分隔符(rs
)和print
甚至只有行。 免责声明:此解决方案假设所有启动标签都有结尾标签,永远不会嵌套,并且每个结尾标签都在其之前的某个地方都有启动标签。(在GNU AWK 5.0.1中测试)
I would harness GNU
AWK
for this task following way, letfile.txt
content bethen
gives output
Explanation: I prepared regular expression which would match both
<monDescription>
and<endMonDescription>
(you might elect to use both of these joined by|
if regular expression which I provide will give false positives with your file), then I inform GNUAWK
to use it as row separator (RS
) and toprint
only even lines. Disclaimer: this solution assumes that all starting tag has ending tag, there is not never nesting and every ending tag has starting tag somewhere before it.(tested in GNU Awk 5.0.1)
如果有领先或尾随,那么,请尝试此非专有 -
awk
解决方案:输入(包装已封装,包括落后空间)
code
<强>输出(封装)
If there's leading or trailing, well, anything, try this non-proprietary-
awk
solution :INPUT (encapsulated, including trailing spaces)
CODE
OUTPUT (encapsulated)