编写更好的正则表达式以不使用惰性重复量词
我有一个正则表达式:
(<select([^>]*>))(.*?)(</select\s*>)
由于它使用惰性重复量词,因此对于较长的字符串(选项超过 500),它会回溯超过 100,000 次并失败。 请帮助我找到一个更好的不使用惰性重复量词的正则表达式
I have a regular expression:
(<select([^>]*>))(.*?)(</select\s*>)
Since it uses lazy repeat quantifier, for longer strings(having options more than 500) it backtracks for more than 100,000 times and fails.
Please help me to find a better regular expression which doesn't use lazy repeat quantifier
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
...或以人类可读的形式:
这是 Friedl 在他的书中开发的“展开循环”技术的示例,掌握正则表达式。我在 RegexBuddy 中使用基于不情愿量词的模式进行了快速测试:
...大约需要 6,000 个步骤才能找到匹配项。 展开循环模式只需要 500 步。当我从结束标记 (
) 中删除右括号时,导致无法匹配,只需要 800 步即可报告失败。
如果您的正则表达式风格支持所有格量词,也可以继续使用它们:
实现匹配需要大约相同数量的步骤,但在此过程中可以使用更少的内存。如果不可能匹配,它会更快失败;在我的测试中,大约需要 500 步,与找到匹配项所需的步数相同。
...or in human-readable form:
This is an example of the "unrolled loop" technique Friedl develops in his book, Mastering Regular Expressions. I did a quick test in RegexBuddy using a pattern based on reluctant quantifiers:
...and it took about 6,000 steps to find a match. The unrolled-loop pattern took only 500 steps. And when I removed the closing bracket from the end tag (
</select
), making a match impossible, it required only 800 steps to report failure.If your regex flavor supports possessive quantifiers, go ahead and use them, too:
It takes about the same number of steps to achieve a match, but it can use a lot less memory in the process. And if no match is possible, it fails even more quickly; in my tests it took about 500 steps, the same number it took to find a match.
不幸的是,这不起作用,请参阅 Alan Moore 的答案以获得正确的示例!
来自 perl regexp 联机帮助页:
Unfortunately this wont work, see the answer by Alan Moore for a correct example!
From the perl regexp manpage: