一段重复的正则表达式可以创建多个组吗?
我使用 RUBY 的正则表达式来处理文本,例如
${1:aaa|bbbb}
${233:aaa | bbbb | ccc ccccc }
${34: aaa | bbbb | cccccccc |d}
${343: aaa | bbbb | cccccccc |dddddd ddddddddd}
${3443:a aa|bbbb|cccccccc|d}
${353:aa a| b b b b | c c c c c c c c | dddddd}
我想获取每条管道之间的修剪文本。例如,对于上例的第一行,我想要得到结果 aaa 和 bbbb,对于第二行,我想要aaa、bbbb 和 ccc ccccc。现在我已经编写了一段正则表达式和一段 ruby 代码来测试它:
array = "${33:aaa|bbbb|cccccccc}".scan(/\$\{\s*(\d+)\s*:(\s*[^\|]+\s*)(?:\|(\s*[^\|]+\s*))+\}/)
puts array
现在我的问题是 (?:\|(\s*[^\|]+\s*))+ 部分无法创建多个组。我不知道如何解决这个问题,因为每行中需要的文本数量是可变的。有人可以帮忙吗?
I'm using RUBY 's regular expression to deal with text such as
${1:aaa|bbbb}
${233:aaa | bbbb | ccc ccccc }
${34: aaa | bbbb | cccccccc |d}
${343: aaa | bbbb | cccccccc |dddddd ddddddddd}
${3443:a aa|bbbb|cccccccc|d}
${353:aa a| b b b b | c c c c c c c c | dddddd}
I want to get the trimed text between each pipe line. For example, for the first line of my upper example, I want to get the result aaa and bbbb, for the second line, I want aaa, bbbb and ccc ccccc. Now I have wrote a piece of regular expression and a piece of ruby code to test it:
array = "${33:aaa|bbbb|cccccccc}".scan(/\$\{\s*(\d+)\s*:(\s*[^\|]+\s*)(?:\|(\s*[^\|]+\s*))+\}/)
puts array
Now my problem is the (?:\|(\s*[^\|]+\s*))+
part can't create multiple groups. I don't know how to solve this problem, because the number of text I need in each line is variable. Can anyone help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
当您在正则表达式中重复捕获组时,捕获组仅存储与其最后一次迭代匹配的文本。如果您需要捕获多次迭代,则需要使用多个正则表达式。 (.NET 是唯一的例外。它的
CaptureCollection
提供捕获组的所有迭代的匹配项。)在您的情况下,您可以执行搜索和替换来替换
^ \d+:
没有任何内容。这会去掉字符串开头的数字和冒号。然后使用正则表达式\s*\|\s*
调用split()
将字符串拆分为由竖线分隔的元素。When you repeat a capturing group in a regular expression, the capturing group only stores the text matched by its last iteration. If you need to capture multiple iterations, you'll need to use more than one regex. (.NET is the only exception to this. Its
CaptureCollection
provides the matches of all iterations of a capturing group.)In your case, you could do a search-and-replace to replace
^\d+:
with nothing. That strips off the number and colon at the start of your string. Then callsplit()
using the regex\s*\|\s*
to split the string into the elements delimited by vertical bars.你为什么不把你的绳子分开呢?
Why don't you split your string?
这可能会帮助您
编写脚本
输出
This might help you
Script
Output
不要试图一次完成所有事情,而是分而治之:
如果你想用单个正则表达式来完成它,你可以使用
scan
,但这似乎更难理解:Instead of trying to do everything at once, divide and conquer:
If you want to do it with a single regex, you can use
scan
, but this seems more difficult to grok: