从字符串中删除不匹配的括号

发布于 2024-10-26 04:44:54 字数 340 浏览 9 评论 0原文

我想从字符串中删除“未配对”的括号。

即，所有 ( 都应该被删除，除非它们后面跟有 ) 字符串中的某处。同样，字符串中某处前面没有 ( 的所有 ) 都应该被删除。

理想情况下，算法也会考虑嵌套。

例如：

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"

原文

I want to remove "un-partnered" parentheses from a string.

I.e., all ('s should be removed unless they're followed by a ) somewhere in the string. Likewise, all )'s not preceded by a ( somewhere in the string should be removed.

Ideally the algorithm would take into account nesting as well.

E.g.:

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瑶笙 2024-11-02 04:44:54

也许可以考虑下推自动机，而不是正则表达式。（我不确定 Ruby 正则表达式是否可以处理这个问题，我相信 Perl 可以）。

一个（非常简单的）过程可能是：

对于输入字符串中的每个字符：

如果它不是“（”或“）”，则将其附加到输出
如果它是“（”，则增加一个 saw_parens 计数器并添加它
如果它是“)”并且 saw_parens > 0，添加它并减少 saw_parens。否则跳过它。

在该过程结束时，如果 saw_parens > 。 0 然后从末尾开始删除那么多括号。（这一步可以使用栈或者递归的方式合并到上面的过程中。）

整个过程是O(n)，即使是开销相对较高的

Happy编码。

回复收藏 0 原文

不再让梦枯萎 2024-11-02 04:44:54

下面使用oniguruma。如果您使用 ruby1.9，Oniguruma 是内置的正则表达式引擎。如果您使用的是 ruby1.8，请参阅：oniguruma。

更新

我太懒了，只是复制并粘贴别人的正则表达式。好像有问题。

所以现在，我自己写了。我相信现在应该可以了。

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end

(?regex1) 将（子）正则表达式 regex1 命名为 name，并使其可以被调用。
?g 将是代表 regex1 的子正则表达式。请注意，?g 并不代表与 regex1 匹配的特定字符串，而是代表 regex1 本身。事实上，可以将 ?g 嵌入到 (?...) 中。

更新2

这更简单。

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end

The following uses oniguruma. Oniguruma is the regex engine built in if you are using ruby1.9. If you are using ruby1.8, see this: oniguruma.

Update

I had been so lazy to just copy and paste someone else's regex. It seemed to have problem.

So now, I wrote my own. I believe it should work now.

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end

(?<name>regex1) names the (sub)regex regex1 as name, and makes it possible to be called.
?g<name> will be a subregex that represents regex1. Note here that ?g<name> does not represent a particular string that matched regex1, but it represents regex1 itself. In fact, it is possible to embed ?g<name> within (?<name>...).

Update 2

This is simpler.

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end

回复收藏 0 原文

尽揽少女心 2024-11-02 04:44:54

构建一个简单的 LR 解析器：

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

运行收益：

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

我不同意人们修改 String 类，因为你永远不应该打开标准类。正则表达式对于解析器来说非常脆弱并且难以支持。我无法想象六个月后回到以前的解决方案并试图记住他们在做什么！

Build a simple LR parser:

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

running yields:

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

I don't agree with the folks modifying the String class because you should never open a standard class. Regexs are pretty brittle for parser and hard to support. I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!

回复收藏 0 原文

套路撩心 2024-11-02 04:44:54

这是我的解决方案，基于 @pst 的算法：

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end

Here's my solution, based on @pst's algorithm:

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end

回复收藏 0 原文