将可选参数与 Bash 正则表达式中的非捕获组相匹配
我想使用 Bash 中的正则表达式将类似于以下内容的字符串解析为单独的变量:
Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";
或
Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";
“标题”之前的第一部分对所有字符串都是通用的,标题和属性部分是可选的。
我设法提取了所有字符串共有的强制参数,但我遇到了可选参数不一定适用于所有字符串的问题。据我所知,Bash 不支持我将用于此目的的非捕获括号。
这是我到目前为止所取得的成就:
CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
我想要使用的正则表达式(并且在 Ruby 中对我有用)是:
CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'
是否有任何其他解决方案可以使用命令行工具解析字符串而不必依赖 perl,蟒蛇还是红宝石?
I want to parse strings similar to the following into separate variables using regular expressions from within Bash:
Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";
or
Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";
The first part before "title" is common to all strings, the parts title and attributes are optional.
I managed to extract the mandatory parameters common to all strings, but I have trouble with optional parameters not necessarily present for all strings. As far as I found out, Bash doesn't support Non-capturing parentheses which I would use for this purpose.
Here is what I achieved thus far:
CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
The regular expression I would like to use (and which is working for me in Ruby) would be:
CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'
Is there any other solution to parse the string with command line tools without having to fall back on perl, python or ruby?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为 bash 正则表达式中不存在非捕获组,因此您的选择是使用脚本语言或从所有
(?:...) 中删除
组,并且要小心您引用的组,例如:?:
请注意,从可选参数开始,我们每次都需要跳过一个组,因为从 4 开始的偶数组包含参数名称以及值(如果该参数存在)。
I don't think non-capturing groups exist in bash regex, so your options are to use a scripting language or to remove the
?:
from all of the(?:...)
groups and just be careful about which groups you reference, for example:Note that starting with the optional parameters we need to skip a group each time, because the even numbered groups from 4 on contain the parameter name as well as the value (if the parameter is present).
您可以使用一点正则表达式魔法来模拟 bash 中的不匹配组:
字符
@
和/
是我们解析的字符串的一部分。正则表达式管道
|
用于左或右(空)部分匹配。出于好奇,
${VAR:-}
是带有默认值的变量扩展,以防 $VAR 为空。You can emulate non-matching groups in bash using a little bit of regexp magic:
Characters
@
and/
are parts of string we parse.Regexp pipe
|
is used for either left or right (empty) part matching.For curious,
${VAR:-<default value>}
is variable expansion with default value in case $VAR is empty.