Perl 正则表达式方言/实现是如何调用的?
Perl 中被称为“正则表达式”的字符串解析引擎与书本上的术语“正则表达式”有很大不同。
所以,我的问题是:是否有一些文档描述了 Perl 的正则表达式实现以及它与经典正则表达式的真正区别(我所说的经典是指可以真正转换为普通 DFA/NFA 的正则表达式)以及如何真正不同有用?
谢谢。
The engine for parsing strings which is called "regular expressions" in Perl is very different from what is known by the term "regular expressions" in books.
So, my question is: is there some document describing the Perl's regexp implementation and how and in what ways does it really differ from the classic one (by classic I mean a regular expressions that can really be transformed to ordinary DFA/NFA) and how it works?
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
Perl正则表达式当然就叫Perl正则表达式,简称正则表达式。它们也可以称为模式或规则。但它们是什么,或者至少可以是是递归下降解析器。它们是使用递归回溯器实现的,尽管您可以换入 DFA 引擎< /a> 如果您希望将 DFA 可解决的任务卸载给它。
以下是关于这些问题的一些相关引文,其中一些内容都令人鼓舞——以及一些文字:)——我的:
— Perl 编程,拉里·沃尔、汤姆·克里斯蒂安森和乔恩·奥尔旺特
— Perl6 Apocalypse 5 :模式匹配,作者:Larry Wall
— Perl6 注释 5:模式匹配,作者:Damian Conway
— Perl6 概要 5:正则表达式和规则,
作者:达米安·康威、艾莉森·兰德尔、帕特里克·米肖、拉里·沃尔和莫里茨·伦茨
Perl regular expressions are of course called Perl regular expressions, or regexes for short. They may also be called patterns or rules. But what they are, or at least can be, is recursive descent parsers. They’re implemented using a recursive backtracker, although you can swap in a DFA engine if you prefer to offload DFA‐solvable tasks to it.
Here are some relevant citations about these matters, with all emboldening — and some of the text :) — mine:
— Programming Perl, by Larry Wall, Tom Christiansen, and Jon Orwant
— Perl6 Apocalypse 5: Pattern Matching, by Larry Wall
— Perl6 Exegesis 5: Pattern Matching, by Damian Conway
— Perl6 Synopsis 5: Regexes and Rules,
by Damian Conway, Allison Randal, Patrick Michaud, Larry Wall, and Moritz Lenz
O'Reilly 的书 '掌握正则表达式' 对 Perl 和其他引擎有很好的解释。对我来说,这是有关该主题的参考书。
The O'Reilly book 'Mastering Regular Expressions' has a very good explanation of Perl's and other engines. For me this is the reference book on the topic.
PCRE 接受的语言没有正式的数学名称。
术语“带有回溯的正则表达式”或“带有反向引用的正则表达式”是差不多的正如你将得到的。任何熟悉其中差异的人都会明白您的意思。
(常见的正则表达式实现只有两种类型:基于 DFA 的和基于回溯的。前者一般接受传统计算机科学意义上的“正则语言”。后者一般接受……更多,并且取决于具体实现,但反向引用始终是非 DFA 功能之一。)
There is no formal mathematical name for the language accepted by PCREs.
The term "regular expressions with backtracking" or "regular expressions with backreferences" is about as close as you will get. Anybody familiar with the difference will know what you mean.
(There are only two common types of regexp implementations: DFA-based, and backtracking-based. The former generally accept the "regular languages" in the traditional Computer Science sense. The latter generally accept... More, and it depends on the specific implementation, but backreferences are always one the non-DFA features.)
我在理论 CS Stack Exchange 上问了同样的问题(正则表达式不是),并且获得最多支持的答案是“正则表达式”。
I asked the same question on the theoretical CS Stack Exchange (Regular expressions aren't), and the answer that got the most upvotes was “regex.”