将可选参数与 Bash 正则表达式中的非捕获组相匹配

发布于 2024-12-24 16:37:57 字数 1261 浏览 1 评论 0原文

我想使用 Bash 中的正则表达式将类似于以下内容的字符串解析为单独的变量:

Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";

Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";

“标题”之前的第一部分对所有字符串都是通用的,标题和属性部分是可选的。

我设法提取了所有字符串共有的强制参数,但我遇到了可选参数不一定适用于所有字符串的问题。据我所知,Bash 不支持我将用于此目的的非捕获括号。

这是我到目前为止所取得的成就:

CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}

我想要使用的正则表达式(并且在 Ruby 中对我有用)是:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'

是否有任何其他解决方案可以使用命令行工具解析字符串而不必依赖 perl,蟒蛇还是红宝石?

I want to parse strings similar to the following into separate variables using regular expressions from within Bash:

Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";

or

Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";

The first part before "title" is common to all strings, the parts title and attributes are optional.

I managed to extract the mandatory parameters common to all strings, but I have trouble with optional parameters not necessarily present for all strings. As far as I found out, Bash doesn't support Non-capturing parentheses which I would use for this purpose.

Here is what I achieved thus far:

CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}

The regular expression I would like to use (and which is working for me in Ruby) would be:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'

Is there any other solution to parse the string with command line tools without having to fall back on perl, python or ruby?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不一样的天空 2024-12-31 16:37:57

我认为 bash 正则表达式中不存在非捕获组,因此您的选择是使用脚本语言或从所有 (?:...) 中删除 ?: 组,并且要小心您引用的组,例如:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(title="([^"]*)";)?\s*(rel="([^"]*)";)?\s*(location="([^"]*)";)?\s*(attributes="([^"]*)";)?\s*(actions="([^"]*)";)?'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo "full:       ${BASH_REMATCH[0]}"
echo "category:   ${BASH_REMATCH[1]}"
echo "scheme:     ${BASH_REMATCH[2]}"
echo "class:      ${BASH_REMATCH[3]}"
echo "title:      ${BASH_REMATCH[5]}"
echo "rel:        ${BASH_REMATCH[7]}"
echo "location:   ${BASH_REMATCH[9]}"
echo "attributes: ${BASH_REMATCH[11]}"
echo "actions:    ${BASH_REMATCH[13]}"

请注意,从可选参数开始,我们每次都需要跳过一个组,因为从 4 开始的偶数组包含参数名称以及值(如果该参数存在)。

I don't think non-capturing groups exist in bash regex, so your options are to use a scripting language or to remove the ?: from all of the (?:...) groups and just be careful about which groups you reference, for example:

CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(title="([^"]*)";)?\s*(rel="([^"]*)";)?\s*(location="([^"]*)";)?\s*(attributes="([^"]*)";)?\s*(actions="([^"]*)";)?'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo "full:       ${BASH_REMATCH[0]}"
echo "category:   ${BASH_REMATCH[1]}"
echo "scheme:     ${BASH_REMATCH[2]}"
echo "class:      ${BASH_REMATCH[3]}"
echo "title:      ${BASH_REMATCH[5]}"
echo "rel:        ${BASH_REMATCH[7]}"
echo "location:   ${BASH_REMATCH[9]}"
echo "attributes: ${BASH_REMATCH[11]}"
echo "actions:    ${BASH_REMATCH[13]}"

Note that starting with the optional parameters we need to skip a group each time, because the even numbered groups from 4 on contain the parameter name as well as the value (if the parameter is present).

笑红尘 2024-12-31 16:37:57

您可以使用一点正则表达式魔法来模拟 bash 中的不匹配组:

              _2__    _4__   _5__
[[ "fu@k" =~ ((.+)@|)((.+)/|)(.+) ]];
echo "${BASH_REMATCH[2]:--} ${BASH_REMATCH[4]:--} ${BASH_REMATCH[5]:--}"
# Output: fu - k

字符 @/ 是我们解析的字符串的一部分。
正则表达式管道 | 用于左或右(空)部分匹配。

出于好奇,${VAR:-} 是带有默认值的变量扩展,以防 $VAR 为空。

You can emulate non-matching groups in bash using a little bit of regexp magic:

              _2__    _4__   _5__
[[ "fu@k" =~ ((.+)@|)((.+)/|)(.+) ]];
echo "${BASH_REMATCH[2]:--} ${BASH_REMATCH[4]:--} ${BASH_REMATCH[5]:--}"
# Output: fu - k

Characters @ and / are parts of string we parse.
Regexp pipe | is used for either left or right (empty) part matching.

For curious, ${VAR:-<default value>} is variable expansion with default value in case $VAR is empty.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文