从字符串中删除不匹配的括号
我想从字符串中删除“未配对”的括号。
即,所有 (
都应该被删除,除非它们后面跟有 )
字符串中的某处。同样,字符串中某处前面没有 (
的所有 )
都应该被删除。
理想情况下,算法也会考虑嵌套。
例如:
"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"
I want to remove "un-partnered" parentheses from a string.
I.e., all (
's should be removed unless they're followed by a )
somewhere in the string. Likewise, all )
's not preceded by a (
somewhere in the string should be removed.
Ideally the algorithm would take into account nesting as well.
E.g.:
"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
也许可以考虑下推自动机,而不是正则表达式。 (我不确定 Ruby 正则表达式是否可以处理这个问题,我相信 Perl 可以)。
一个(非常简单的)过程可能是:
对于输入字符串中的每个字符:
在该过程结束时,如果 saw_parens > 。 0 然后从末尾开始删除那么多括号。 (这一步可以使用栈或者递归的方式合并到上面的过程中。)
整个过程是
O(n)
,即使是开销相对较高的Happy编码。
Instead of a regex, consider a push-down automata, perhaps. (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can).
A (very trivialized) process may be:
For each character in the input string:
At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. (This step can be merged into the above process with use of a stack or recursion.)
The entire process is
O(n)
, even if a relatively high overheadHappy coding.
下面使用oniguruma。如果您使用 ruby1.9,Oniguruma 是内置的正则表达式引擎。如果您使用的是 ruby1.8,请参阅:oniguruma。
更新
我太懒了,只是复制并粘贴别人的正则表达式。好像有问题。
所以现在,我自己写了。我相信现在应该可以了。
(?regex1)
将(子)正则表达式regex1
命名为name
,并使其可以被调用。?g
将是代表regex1
的子正则表达式。请注意,?g
并不代表与regex1
匹配的特定字符串,而是代表regex1
本身。事实上,可以将?g
嵌入到(?...)
中。更新2
这更简单。
The following uses oniguruma. Oniguruma is the regex engine built in if you are using ruby1.9. If you are using ruby1.8, see this: oniguruma.
Update
I had been so lazy to just copy and paste someone else's regex. It seemed to have problem.
So now, I wrote my own. I believe it should work now.
(?<name>regex1)
names the (sub)regexregex1
asname
, and makes it possible to be called.?g<name>
will be a subregex that representsregex1
. Note here that?g<name>
does not represent a particular string that matchedregex1
, but it representsregex1
itself. In fact, it is possible to embed?g<name>
within(?<name>...)
.Update 2
This is simpler.
构建一个简单的 LR 解析器:
运行收益:
我不同意人们修改 String 类,因为你永远不应该打开标准类。正则表达式对于解析器来说非常脆弱并且难以支持。我无法想象六个月后回到以前的解决方案并试图记住他们在做什么!
Build a simple LR parser:
running yields:
I don't agree with the folks modifying the String class because you should never open a standard class. Regexs are pretty brittle for parser and hard to support. I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!
这是我的解决方案,基于 @pst 的算法:
Here's my solution, based on @pst's algorithm:
算法:
Java代码:
当前@ http://a2ajp.blogspot.in/2014 /10/remove-unmatched-parenthesis-from-given.html
Algorithm:
Java code:
Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html