Perl 正则表达式方言/实现是如何调用的?

发布于 2024-11-09 09:26:01 字数 166 浏览 4 评论 0原文

Perl 中被称为“正则表达式”的字符串解析引擎与书本上的术语“正则表达式”有很大不同。

所以,我的问题是:是否有一些文档描述了 Perl 的正则表达式实现以及它与经典正则表达式的真正区别(我所说的经典是指可以真正转换为普通 DFA/NFA 的正则表达式)以及如何真正不同有用?

谢谢。

The engine for parsing strings which is called "regular expressions" in Perl is very different from what is known by the term "regular expressions" in books.

So, my question is: is there some document describing the Perl's regexp implementation and how and in what ways does it really differ from the classic one (by classic I mean a regular expressions that can really be transformed to ordinary DFA/NFA) and how it works?

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

淡淡绿茶香 2024-11-16 09:26:01

Perl正则表达式当然就叫Perl正则表达式,简称正则表达式。它们也可以称为模式或规则。但它们是什么,或者至少可以是递归下降解析器。它们是使用递归回溯器实现的,尽管您可以换入 DFA 引擎< /a> 如果您希望将 DFA 可解决的任务卸载给它。

以下是关于这些问题的一些相关引文,其中一些内容都令人鼓舞——以及一些文字:)——我的:

您可以通过创建正则表达式(或正则表达式)来指定模式,
和 Perl 的正则表达式引擎(“引擎”,对于本节的其余部分)
Chapter)然后采用该表达式并确定是否(以及如何)
模式与您的数据匹配。虽然您的大部分数据可能是
文本字符串,没有什么可以阻止您使用正则表达式进行搜索
并替换任何字节序列,甚至是您通常认为的
“二进制”数据。对于 Perl 来说,字节只是碰巧具有的字符
序数值小于 256。

如果您熟悉其他地方的正则表达式,我们
应该警告您 Perl 中的正则表达式有点不同。
首先,它们在理论意义上并不完全是“规则的”。
这个词,这意味着他们可以做比传统常规的更多的事情
计算机科学课程中教授的表达式。
其次,它们被使用
在 Perl 中,它们经常有自己的特殊变量、运算符、
以及紧密集成到语言中的引用约定,
不像任何其他库那样只是松散地固定在一起。

      — Perl 编程,拉里·沃尔、汤姆·克里斯蒂安森和乔恩·奥尔旺特

这是模式匹配的启示录,通常与
我们所说的“正则表达式”,与
真正的正则表达式。
然而,该术语随着
我们的模式匹配引擎的功能,所以我不会尝试
在这里与语言的必要性作斗争。不过,我通常会称它们为
“regexes”(或“regexen”,当我处于盎格鲁-撒克逊心情时)。

     — Perl6 Apocalypse 5 :模式匹配,作者:Larry Wall

那里有很多新语法,所以让我们慢慢地逐步了解它,从以下开始:

    $file = rx/ ^  <$hunk>*  $ /;

该语句创建一个模式对象。或者,正如 Perl 6 中所知,
“规则”。人们可能仍然会称它们为“正则表达式”或
“正则表达式”也是如此(关键字 rx 反映了这一点),但 Perl 模式很长
以前不再是“常规”,所以我们会尽力避免这些
条款。

[更新:我们重新使用了术语“正则表达式”来指代这些模式
一般的。当我们现在说“规则”时,我们特指那种
您将在语法中使用的正则表达式。参见S05。]

      — Perl6 注释 5:模式匹配,作者:Damian Conway

本文档总结了 Apocalypse 5,它是关于新的正则表达式语法的。
我们现在尝试将它们称为正则表达式而不是“正则表达式”,因为
它们已经很长时间没有成为正则表达式了,我们认为
流行术语“正则表达式”正在成为一个具有以下含义的技术术语:
精确含义:“你进行模式匹配的东西,有点像常规的
另一方面,重新设计的目的之一
是为了使我们的模式的某些部分更易于在以下条件下进行分析
传统的正则表达式和解析器语义,其中涉及
仔细区分我们的模式的哪些部分和
语法被视为声明性的,而哪些部分被视为过程性的。

无论如何,当引用语法中的递归模式时,
术语规则令牌通常优于正则表达式

      — Perl6 概要 5:正则表达式和规则,
作者:达米安·康威、艾莉森·兰德尔、帕特里克·米肖、拉里·沃尔和莫里茨·伦茨

Perl regular expressions are of course called Perl regular expressions, or regexes for short. They may also be called patterns or rules. But what they are, or at least can be, is recursive descent parsers. They’re implemented using a recursive backtracker, although you can swap in a DFA engine if you prefer to offload DFA‐solvable tasks to it.

Here are some relevant citations about these matters, with all emboldening — and some of the text :) — mine:

You specify a pattern by creating a regular expression (or regex),
and Perl’s regular expression engine (the “Engine”, for the rest of this
chapter) then takes that expression and determines whether (and how) the
pattern matches your data. While most of your data will probably be
text strings, there’s nothing stopping you from using regexes to search
and replace any byte sequence, even what you’d normally think of as
“binary” data. To Perl, bytes are just characters that happen to have
an ordinal value less than 256.

If you’re acquainted with regular expressions from some other venue, we
should warn you that regular expressions are a bit different in Perl.
First, they aren’t entirely “regular” in the theoretical sense of the
word, which means they can do much more than the traditional regular
expressions taught in computer science classes.
Second, they are used
so often in Perl that they have their own special variables, operators,
and quoting conventions which are tightly integrated into the language,
not just loosely bolted on like any other library.

      — Programming Perl, by Larry Wall, Tom Christiansen, and Jon Orwant

This is the Apocalypse on Pattern Matching, generally having to do with
what we call “regular expressions”, which are only marginally related to
real regular expressions.
Nevertheless, the term has grown with the
capabilities of our pattern matching engines, so I’m not going to try to
fight linguistic necessity here. I will, however, generally call them
“regexes” (or “regexen”, when I’m in an Anglo‐Saxon mood).

      — Perl6 Apocalypse 5: Pattern Matching, by Larry Wall

There’s a lot of new syntax there, so let’s step through it slowly, starting with:

    $file = rx/ ^  <$hunk>*  $ /;

This statement creates a pattern object. Or, as it’s known in Perl 6, a
“rule”. People will probably still call them “regular expressions” or
“regexes” too (and the keyword rx reflects that), but Perl patterns long
ago ceased being anything like “regular”,
so we’ll try and avoid those
terms.

[Update: We’ve resurrected the term “regex” to refer to these patterns in
general. When we say “rule” now, we’re specifically referring to the kind
of regex that you would use in a grammar. See S05.]

      — Perl6 Exegesis 5: Pattern Matching, by Damian Conway

This document summarizes Apocalypse 5, which is about the new regex syntax.
We now try to call them regex rather than “regular expressions” because
they haven’t been regular expressions for a long time, and we think the
popular term “regex” is in the process of becoming a technical term with a
precise meaning of: “something you do pattern matching with, kinda like a regular
expression”.
On the other hand, one of the purposes of the redesign
is to make portions of our patterns more amenable to analysis under
traditional regular expression and parser semantics, and that involves
making careful distinctions between which parts of our patterns and
grammars are to be treated as declarative, and which parts as procedural.

In any case, when referring to recursive patterns within a grammar, the
terms rule and token are generally preferred over regex.

      — Perl6 Synopsis 5: Regexes and Rules,
by Damian Conway, Allison Randal, Patrick Michaud, Larry Wall, and Moritz Lenz

毁虫ゝ 2024-11-16 09:26:01

O'Reilly 的书 '掌握正则表达式' 对 Perl 和其他引擎有很好的解释。对我来说,这是有关该主题的参考书。

The O'Reilly book 'Mastering Regular Expressions' has a very good explanation of Perl's and other engines. For me this is the reference book on the topic.

小…楫夜泊 2024-11-16 09:26:01

PCRE 接受的语言没有正式的数学名称。

术语“带有回溯的正则表达式”或“带有反向引用的正则表达式”是差不多的正如你将得到的。任何熟悉其中差异的人都会明白您的意思。

(常见的正则表达式实现只有两种类型:基于 DFA 的和基于回溯的。前者一般接受传统计算机科学意义上的“正则语言”。后者一般接受……更多,并且取决于具体实现,但反向引用始终是非 DFA 功能之一。)

There is no formal mathematical name for the language accepted by PCREs.

The term "regular expressions with backtracking" or "regular expressions with backreferences" is about as close as you will get. Anybody familiar with the difference will know what you mean.

(There are only two common types of regexp implementations: DFA-based, and backtracking-based. The former generally accept the "regular languages" in the traditional Computer Science sense. The latter generally accept... More, and it depends on the specific implementation, but backreferences are always one the non-DFA features.)

煮茶煮酒煮时光 2024-11-16 09:26:01

我在理论 CS Stack Exchange 上问了同样的问题(正则表达式不是),并且获得最多支持的答案是“正则表达式”。

I asked the same question on the theoretical CS Stack Exchange (Regular expressions aren't), and the answer that got the most upvotes was “regex.”

冷夜 2024-11-16 09:26:01
  • 该方言称为 PCRE(Perl 兼容正则表达式)。
  • 它记录在Perl 手册中。
  • 或者在 Wall、Orwant 和 Christiansen 所著的“Programming Perl”中
  • The dialect is called PCRE (Perl-compatible Regular Expressions).
  • It's documented in the Perl manual.
  • Or in "Programming Perl" by Wall, Orwant and Christiansen
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文