提取字符串之间的文本
如何从充满这些行的文件中提取具有非常特定模式的字符串之间的文本?例如:
input:a_log.gz:make=BMW&year=2000&owner=Peter
我想从本质上捕获 make=BMW&year=2000
部分。我知道该行可以以“input:(任意数量的字符).gz:”开头,以“owner=Peter”结尾
How do I extract text in between strings with very specific pattern from a file full of these lines? Ex:
input:a_log.gz:make=BMW&year=2000&owner=Peter
I want to essentially capture the part make=BMW&year=2000
. I know for a fact that the line can start out as "input:(any number of characters).gz:" and end with "owner=Peter"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用正则表达式:
input:.*?\.gz:(.*?)&?owner=Peter
。捕获将包含第二个冒号和“owner=Peter”之间的内容,并修剪&符号。Use the regex:
input:.*?\.gz:(.*?)&?owner=Peter
. The capture will contain the things between the second colon and "owner=Peter", trimming the ampersand.尝试一下:
这将提取第二个冒号和第二个&符号之间的所有内容,无论之前和之后的内容(如果有更多冒号或&符号,它可能无法正常工作)。
Give this a try:
This will extract everything between the second colon and the second ampersand regardless of what's before and after (if there are more colons or ampersands it may not work properly).
你可以使用 shell(bash/ksh)
如果你想要 sed
you can use the shell(bash/ksh)
if you want sed
我没有看到使用
awk
的答案:该方法是
sh
版本(按参数扩展的子字符串)和sed
之间的混合>(正则表达式)版本。这是因为awk
RE 缺乏反向引用。I didn't see an answer using
awk
:The method is sort of a mix between the
sh
version (substring by parameter expansions) and thesed
(regular expressions) versions. This is becauseawk
RE's lack backreferences.