从字符串中删除不匹配的括号

发布于 2024-10-26 04:44:54 字数 340 浏览 9 评论 0原文

我想从字符串中删除“未配对”的括号。

即,所有 ( 都应该被删除,除非它们后面跟有 ) 字符串中的某处。同样,字符串中某处前面没有 ( 的所有 ) 都应该被删除。

理想情况下,算法也会考虑嵌套。

例如:

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"

I want to remove "un-partnered" parentheses from a string.

I.e., all ('s should be removed unless they're followed by a ) somewhere in the string. Likewise, all )'s not preceded by a ( somewhere in the string should be removed.

Ideally the algorithm would take into account nesting as well.

E.g.:

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

瑶笙 2024-11-02 04:44:54

也许可以考虑下推自动机,而不是正则表达式。 (我不确定 Ruby 正则表达式是否可以处理这个问题,我相信 Perl 可以)。

一个(非常简单的)过程可能是:

对于输入字符串中的每个字符:

  1. 如果它不是“(”或“)”,则将其附加到输出
  2. 如果它是“(”,则增加一个 saw_parens 计数器并添加它
  3. 如果它是“)”并且 saw_parens > 0,添加它并减少 saw_parens。否则跳过它。

在该过程结束时,如果 saw_parens > 。 0 然后从末尾开始删除那么多括号。 (这一步可以使用栈或者递归的方式合并到上面的过程中。)

整个过程是O(n),即使是开销相对较高的

Happy编码。

Instead of a regex, consider a push-down automata, perhaps. (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can).

A (very trivialized) process may be:

For each character in the input string:

  1. If it is not a '(' or ')' then just append it to the output
  2. If it is a '(' increase a seen_parens counter and add it
  3. If it is a ')' and seen_parens is > 0, add it and decrease seen_parens. Otherwise skip it.

At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. (This step can be merged into the above process with use of a stack or recursion.)

The entire process is O(n), even if a relatively high overhead

Happy coding.

不再让梦枯萎 2024-11-02 04:44:54

下面使用oniguruma。如果您使用 ruby​​1.9,Oniguruma 是内置的正则表达式引擎。如果您使用的是 ruby​​1.8,请参阅:oniguruma

更新

我太懒了,只是复制并粘贴别人的正则表达式。好像有问题。

所以现在,我自己写了。我相信现在应该可以了。

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end
  • (?regex1) 将(子)正则表达式 regex1 命名为 name,并使其可以被调用。
  • ?g 将是代表 regex1 的子正则表达式。请注意,?g 并不代表与 regex1 匹配的特定字符串,而是代表 regex1 本身。事实上,可以将 ?g 嵌入到 (?...) 中。

更新2

这更简单。

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end

The following uses oniguruma. Oniguruma is the regex engine built in if you are using ruby1.9. If you are using ruby1.8, see this: oniguruma.

Update

I had been so lazy to just copy and paste someone else's regex. It seemed to have problem.

So now, I wrote my own. I believe it should work now.

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end
  • (?<name>regex1) names the (sub)regex regex1 as name, and makes it possible to be called.
  • ?g<name> will be a subregex that represents regex1. Note here that ?g<name> does not represent a particular string that matched regex1, but it represents regex1 itself. In fact, it is possible to embed ?g<name> within (?<name>...).

Update 2

This is simpler.

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end
尽揽少女心 2024-11-02 04:44:54

构建一个简单的 LR 解析器:

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

运行收益:

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

我不同意人们修改 String 类,因为你永远不应该打开标准类。正则表达式对于解析器来说非常脆弱并且难以支持。我无法想象六个月后回到以前的解决方案并试图记住他们在做什么!

Build a simple LR parser:

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

running yields:

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

I don't agree with the folks modifying the String class because you should never open a standard class. Regexs are pretty brittle for parser and hard to support. I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!

套路撩心 2024-11-02 04:44:54

这是我的解决方案,基于 @pst 的算法:

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end

Here's my solution, based on @pst's algorithm:

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end
哭了丶谁疼 2024-11-02 04:44:54

算法:

  1. 遍历给定的字符串。
  2. 执行此操作时,跟踪堆栈中的“(”位置。
  3. 如果找到任何“)”,则从堆栈中删除顶部元素。
    • 如果堆栈为空,则从字符串中删除“)”。
  4. 最后,我们可以得到不匹配的大括号的位置(如果有的话)。

Java代码:
当前@ http://a2ajp.blogspot.in/2014 /10/remove-unmatched-parenthesis-from-given.html

Algorithm:

  1. Traverse through the given string.
  2. While doing that, keep track of "(" positions in a stack.
  3. If any ")" found, remove the top element from the stack.
    • If stack is empty, remove the ")" from the string.
  4. In the end, we can have positions of unmatched braces, if any.

Java code:
Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文