复杂的lookbehind中的无限量词
我在编写这个正则表达式时遇到了很多麻烦:
(?<=\s+|^\s*|\(\s*|\.)(?:item|item1|item2)(?=\s+|\s*$|\s*\)|\.)
它在我的正则表达式编辑器(Expresso)和.NET环境中运行得很好,但在Java环境(使用Eclipse Helios R2的JRE 1.6.0.25)中则不然之所以有效,是因为 Pattern.compile()
方法会抛出“语法错误 U_REGEX_LOOK_BEHIND_LIMIT”异常。
这是因为模式 (?<=\s+|^\s*|\(\s*|\.)
后面的查找必须有一个定义的限制(无限量词,例如 *<据我所知,这里不允许 /code> 和
+
)。
我也尝试以这种方式指定重复范围,但没有成功:
(?<=\s{0,1000}|^\s{0,1000}|\(\s{0,1000}|\.)(?:item|item1|item2)(?=\s+|\s*$|\s*\)|\.)
那么,我怎样才能编写一个相同的正则表达式呢?即使在Java环境下? 我不敢相信这种常见情况没有解决方法......
I'm having a lot of trouble writing this regular expression:
(?<=\s+|^\s*|\(\s*|\.)(?:item|item1|item2)(?=\s+|\s*$|\s*\)|\.)
It works very well on my regex editor (Expresso) and in the .NET environment, but in the Java environment (JRE 1.6.0.25 using Eclipse Helios R2) it doesn't work because the Pattern.compile()
method throws a "Syntax error U_REGEX_LOOK_BEHIND_LIMIT" exception.
That's because the look behind pattern (?<=\s+|^\s*|\(\s*|\.)
must have a defined limit (unlimited quantifiers such as *
and +
are not allowed here as far as I know).
I also tried to specify the range of repetition in this way with no luck:
(?<=\s{0,1000}|^\s{0,1000}|\(\s{0,1000}|\.)(?:item|item1|item2)(?=\s+|\s*$|\s*\)|\.)
So, how can I write an identical regex that works even on Java environment?
I can't believe that there's no workaround for this kind of common situation....
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请记住,向后查找只会向后查找所需的距离。例如,如果前一个字符是空格,则满足
(?<=\s+)
;它不需要再往前看。你的后视也是如此。如果它不是字符串的开头,并且前一个字符不是空格、左括号或句点,则没有必要再往前看。它相当于:
你的前瞻可以用同样的方式压缩。如果它不是字符串的结尾,并且下一个字符不是空格、右括号或句点,则没有必要再进一步查找:
因此最终的正则表达式是:
Keep in mind that the lookbehind will only look as far behind as it must. For example,
(?<=\s+)
will be satisfied if the previous character is a space; it doesn't need to look any farther back.The same is true of your lookbehind. If it's not the beginning of the string and the previous character is not whitespace, an open-parenthesis or a period, there's no point looking any farther back. It's equivalent to this:
Your lookahead can be condensed in the same way. If it's not the end of the string, and the next character is not whitespace, a close-parenthesis or a period, there's no point looking any further:
So the final regex is: