去除括号的亚表达

发布于 2025-02-09 19:00:56 字数 1193 浏览 0 评论 0原文

我有一个字符串,要从其上删除所有具有以下属性的子字符串:

1. They start with an arbitrary (non-zero) number of open parenthesis
2. Then follows an arbitrary set of word characters (`\w`)
3. Then follows the same number of closing parenthesis as there have been open parenthesis.

纯正则表达式无法匹配打开和关闭括号。我第一次尝试WSA找到一种动态使用反向提高的方法。我知道这不是有效的红宝石,而是要给您一个想法:

sttrep = str.gsub(/([(]+) \w+ [)]#{\1.size}/x, '')

当然,\ 1. size是无效的;但是,有没有办法使用插值,我可以根据反向评估评估某些东西?

另一个可能的是在循环中重复使用gsub,然后一次删除一个级别的括号:

tmpstr = str
loop do
  strrep = tmpstr.gsub(/[(] ([(]\w+[)]) [)]/x, "(\\1)")
  if tmpstr == strrep
    # We only have one level of parenthesis to consider
    sttrep = str.gsub(/[(]\w+[)]/x, '')
    break
  else
    tmpstr = strrep
  end
end
# strrep is now the resulting string
    

但是,这似乎是一个过于复杂的解决方案。有什么想法(当然,除了写我的欧文字符串解析器外,它会循环每个角色并计算括号)吗?

更新

示例1:

str = "ab((((cd))))ef((gh))ij(kl)mn"

strrep应包含 abefijmn

示例2:

str = "((((abc));def;((ghi)))"

strrep应包含(; def;)

I have a String and want to delete from it all substrings with the following properties:

1. They start with an arbitrary (non-zero) number of open parenthesis
2. Then follows an arbitrary set of word characters (`\w`)
3. Then follows the same number of closing parenthesis as there have been open parenthesis.

Pure regular expressions can not match open and closing parenthesis. My first attempt wsa to find a way to use backreferences dynamically. I know that this is not valid Ruby, but to give you an idea:

sttrep = str.gsub(/([(]+) \w+ [)]#{\1.size}/x, '')

Of course the \1.size is invalid; but is there a way using interpolation, where I could evaluate something based on a backreference?

Another possible would be to repeatedly use gsub in a loop and remove one level of parenthesis at a time:

tmpstr = str
loop do
  strrep = tmpstr.gsub(/[(] ([(]\w+[)]) [)]/x, "(\\1)")
  if tmpstr == strrep
    # We only have one level of parenthesis to consider
    sttrep = str.gsub(/[(]\w+[)]/x, '')
    break
  else
    tmpstr = strrep
  end
end
# strrep is now the resulting string
    

However, this seems to be an overly complicated solution. Any ideas (except of course writing my owen string parser which loops over each character and counts the parenthesis)?

UPDATE:

Example1:

str = "ab((((cd))))ef((gh))ij(kl)mn"

strrep should contain abefijmn.

Example2:

str = "((((abc));def;((ghi)))"

strrep should contain (;def;).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

千秋岁 2025-02-16 19:00:56

通常,要匹配您描述的字符串,您需要使用Regex子例程:

(\((?:\w+|\g<1>)?\))

请参阅 regex demo

详细信息

  • (\(((?:\ w+| \ g&lt; 1&gt;)?\))) - 第1组(为递归目的捕获必需):
    • \( - a char
    • (?:\ w+| \ g&lt; 1&gt;)? - 一个或多个单词chars或第1组模式的可选出现
    • \) - a char。

为了提高效率,请考虑使用原子组而不是非捕捉组:

(\((?>\w+|\g<1>)?\))
    ^^

请参阅 ruby​​ demo

puts [
    'ab((((cd))))ef((gh))ij(kl)mn',
    '((((abc));def;((ghi)))',
    '(((foo)) , bar)'
].map {|x| x.gsub(/(\((?:\w+|\g<1>)?\))/, '')}

输出:

abefijmn
((;def;)
( , bar)

In general, to match strings you described, you need to use regex subroutines:

(\((?:\w+|\g<1>)?\))

See the regex demo.

Details:

  • (\((?:\w+|\g<1>)?\)) - Group 1 (capturing is necessary for recursion purposes):
    • \( - a ( char
    • (?:\w+|\g<1>)? - an optional occurrence of one or more word chars or Group 1 pattern recursed
    • \) - a ) char.

To make it a bit more efficient, consider using an atomic group rather than a non-capturing group:

(\((?>\w+|\g<1>)?\))
    ^^

See the Ruby demo:

puts [
    'ab((((cd))))ef((gh))ij(kl)mn',
    '((((abc));def;((ghi)))',
    '(((foo)) , bar)'
].map {|x| x.gsub(/(\((?:\w+|\g<1>)?\))/, '')}

Output:

abefijmn
((;def;)
( , bar)
鹿! 2025-02-16 19:00:56

据我所知,您无需解析任何“复杂”(例如任意S-表达)等的任何“复杂” - 您感兴趣的只是消除((((foo))))>和(((bar))(它们具有相同数量的打开/关闭parens),但要保留((((foo))bar)完整的内容。

如果此假设是正确的,则非常简单gsub可以完成工作:

def delete_parentheses(str)
  str.gsub(/(\(+)\w+(\)+)/) do |match|
    $1.size == $2.size ? "" : match
  end
end

delete_parentheses("Here ((be)) dragons") # => Here dragons
delete_parentheses("Here ((be) dragons") # Here ((be) dragons

As far as I understand, you don't need to parse anything "complex" like arbitrary S-expressions etc - all you're interested in is just to eliminate things like (((foo))) and ((bar)) (they have the same number of opening/closing parens) but keep things like (((foo)) bar) intact.

If this assumption is correct then quite simple gsub can do the job:

def delete_parentheses(str)
  str.gsub(/(\(+)\w+(\)+)/) do |match|
    $1.size == $2.size ? "" : match
  end
end

delete_parentheses("Here ((be)) dragons") # => Here dragons
delete_parentheses("Here ((be) dragons") # Here ((be) dragons
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文