为什么基于文本的正则发动机无法处理懒惰的量词?

发布于 2025-01-30 18:15:26 字数 1222 浏览 3 评论 0原文

我遇到过很多次,基于文本的正则发动机无法支持懒惰量词。但是我找不到原因。但是我知道,他们在内部使用DFA,并且没有回溯。

然后我坐下来为懒惰的量词绘制DFA,这就是我得到的。
\/\*。*\*\/(其中*是懒惰)(正确匹配C注释):
regex form dfa下方的dfa 以确保其表现如前所述)

进行比较,我还写了贪婪的量词版本。
\/\*。*\*\*\/(其中*是贪婪的)(不正确匹配C注释):
regex form dfa以下以确保其表现如前所述)

对不起,我忘了在图中添加启动状态。将最左侧的状态视为两者的开始。

在这里,贪婪的版本实际上需要备份!此外,DFA可以完成懒惰的量词。我做错了吗?为什么有很多说基于文本的正则发动机不支持懒惰?

ps:为了保持简单,我没有完成以上两个DFA。要完成它们,只需添加从每个状态到新的错误状态的所有不存在的移动即可。一旦进入错误状态,就不会退出。

I have came across many times that, text-based regex engines cannot support lazy quantifiers. But I could not find why. But I came to know that, they internally use DFAs, and they do not backtrack.

Then I sat down to draw a DFA for lazy quantifiers and this is what I got.
\/\*.*\*\/(where * is lazy)(matches C comments correctly):
(regex form of below DFA to make sure it is behaving as expected)
enter image description here

For comparison, I also wrote the greedy quantifier version.
\/\*.*\*\/(where * is greedy)(matches C comments incorrectly):
(regex form of below DFA to make sure it is behaving as expected)
enter image description here

Sorry, I forgot to add start state in the diagram. Consider left-most state as start in both of them.

Here, greedy version actually requires backing-up! Also, lazy quantifiers can be done with DFA. Did I did anything wrong? Why are there many saying text-based regex engines don't support lazy?

PS: To keep it simple, I did not complete both of the above DFAs. To complete them, just add all the non-existent moves from every state to a new error state. Once entered error state, no exiting from it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文