为什么基于文本的正则发动机无法处理懒惰的量词?
我遇到过很多次,基于文本的正则发动机无法支持懒惰量词。但是我找不到原因。但是我知道,他们在内部使用DFA,并且没有回溯。
然后我坐下来为懒惰的量词绘制DFA,这就是我得到的。\/\*。*\*\/
(其中*是懒惰)(正确匹配C注释):
( regex form dfa下方的dfa 以确保其表现如前所述)
进行比较,我还写了贪婪的量词版本。\/\*。*\*\*\/
(其中*是贪婪的)(不正确匹配C注释):
( regex form dfa以下以确保其表现如前所述)
对不起,我忘了在图中添加启动状态。将最左侧的状态视为两者的开始。
在这里,贪婪的版本实际上需要备份!此外,DFA可以完成懒惰的量词。我做错了吗?为什么有很多说基于文本的正则发动机不支持懒惰?
ps:为了保持简单,我没有完成以上两个DFA。要完成它们,只需添加从每个状态到新的错误状态的所有不存在的移动即可。一旦进入错误状态,就不会退出。
I have came across many times that, text-based regex engines cannot support lazy quantifiers. But I could not find why. But I came to know that, they internally use DFAs, and they do not backtrack.
Then I sat down to draw a DFA for lazy quantifiers and this is what I got.\/\*.*\*\/
(where * is lazy)(matches C comments correctly):
(regex form of below DFA to make sure it is behaving as expected)
For comparison, I also wrote the greedy quantifier version.\/\*.*\*\/
(where * is greedy)(matches C comments incorrectly):
(regex form of below DFA to make sure it is behaving as expected)
Sorry, I forgot to add start state in the diagram. Consider left-most state as start in both of them.
Here, greedy version actually requires backing-up! Also, lazy quantifiers can be done with DFA. Did I did anything wrong? Why are there many saying text-based regex engines don't support lazy?
PS: To keep it simple, I did not complete both of the above DFAs. To complete them, just add all the non-existent moves from every state to a new error state. Once entered error state, no exiting from it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论