AWK:从线条模式访问捕获的组
如果我有一个 awk 命令
pattern { ... }
并且模式使用捕获组,我如何访问块中捕获的字符串?
If I have an awk command
pattern { ... }
and pattern uses a capturing group, how can I access the string so captured in the block?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
通过 gawk,您可以使用
match
函数来捕获带括号的组。示例:
输出
cd
。请注意 gawk 的具体用法,它实现了相关功能。
对于便携式替代方案,您可以使用
match()
和substr
获得类似的结果。示例:
输出
cd
。With gawk, you can use the
match
function to capture parenthesized groups.example:
outputs
cd
.Note the specific use of gawk which implements the feature in question.
For a portable alternative you can achieve similar results with
match()
andsubstr
.example:
outputs
cd
.那是一段回忆……
我很久以前就用 Perl 取代了 awk。
显然 AWK 正则表达式引擎不捕获其组。
你可能会考虑使用类似的东西:
-n 标志使 perl 像 awk 一样循环每一行。
That was a stroll down memory lane...
I replaced awk by perl a long time ago.
Apparently the AWK regular expression engine does not capture its groups.
you might consider using something like :
the -n flag causes perl to loop over every line like awk does.
这是我一直需要的东西,所以我为它创建了一个 bash 函数。它基于格伦·杰克曼的回答。
定义
将其添加到您的 .bash_profile 等中。
用法
捕获文件中每一行的正则表达式
捕获文件中每一行的第一个正则表达式捕获组
This is something I need all the time so I created a bash function for it. It's based on glenn jackman's answer.
Definition
Add this to your .bash_profile etc.
Usage
Capture regex for each line in file
Capture 1st regex capture group for each line in file
您可以使用 GNU awk:
You can use GNU awk:
注意:
gensub
的使用不符合 POSIX 标准您也可以在普通 awk 中模拟捕获,无需扩展。但它并不直观:
步骤 1. 使用 gensub 将匹配项与字符串中未出现的某些字符包围起来。
步骤 2. 对角色使用 split。
步骤 3. 分割数组中的每个其他元素都是您的捕获组。
NOTE: the use of
gensub
is not POSIX compliantYou can simulate capturing in vanilla awk too, without extensions. Its not intuitive though:
step 1. use gensub to surround matches with some character that doesnt appear in your string.
step 2. Use split against the character.
step 3. Every other element in the splitted array is your capture group.
我在想出一个包含 Peter Tillemans 答案的 bash 函数时遇到了一些困难,但这是我想出的:
发现对于以下正则表达式参数,这比 opsb 的基于 awk 的 bash 函数效果更好,因为我不希望打印“ms”。
I struggled a bit with coming up with a bash function that wraps Peter Tillemans' answer but here's what I came up with:
I found this worked better than opsb's awk-based bash function for the following regular expression argument, because I do not want the "ms" to be printed.
我认为 gawk match()-to-array 仅适用于捕获组的第一个实例。
如果您想要捕获多个内容,并对它们执行任何复杂的操作,也许
这样您就不会受到
gensub()
的限制,这会限制您的修改的复杂性,或者通过match()
。通过纯粹的反复试验,我注意到关于 unicode 模式下的 gawk 的一个警告:对于有效的 unicode 字符串 뀇꿬 ,其 6 个八进制代码如下所示:
i think gawk match()-to-array is only for first instance of the capture group.
if there are multiple things you'd like to capture, and perform any complex operations upon them, perhaps
This way you aren't constrained by either
gensub()
, which limits the complexity if your modifications, or bymatch()
.by pure trial-and-error, one caveat i've noted about gawk in unicode mode : for a valid unicode string 뀇꿬 with the 6 octal codes listed below :