为什么没有正则表达式标准?
我知道 Perl 正则表达式是一种事实上的次要标准,但为什么没有人提出一套通用的标准符号、语法和行为呢?
I know there is the perl regex that is sort of a minor de facto standard, but why hasn't anyone come up with a universal set of standard symbols, syntax and behaviors?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
IEEE 制定了与 POSIX 工作相关的标准 。真正的问题是“为什么不是每个人都遵循它”?答案可能是,在贪婪匹配等方面,它并不像 PCRE(Perl 兼容正则表达式)那么复杂。
There is a standard by IEEE associated with the POSIX effort. The real question is "why doesn't everyone follow it"? The answer is probably that it is not quite as complex as PCRE (Perl Compatible Regular Expression) with respect to greedy matching and what not.
实际上,有一个正则表达式标准(POSIX),但它很糟糕。因此,人们扩展他们的 RE 引擎来满足他们的应用程序的需求。 PCRE(Perl 兼容正则表达式)是与 Perl 的 RE 引擎兼容的正则表达式的伪标准。这一点特别重要,因为您可以将 Perl 的引擎嵌入到其他应用程序中。
Actually, there is a regular expression standard (POSIX), but it's crappy. So people extend their RE engine to fit the needs of their application. PCRE (Perl-compatible regular expressions) is a pseudo-standard for regular expressions that are compatible with Perl's RE engine. This is particularly relevant because you can embed Perl's engine into other applications.
因为制定标准很难。几乎不可能让足够多的人就任何事情达成一致,使其成为官方标准,更不用说像正则表达式这样复杂的事情了。事实上的标准更容易获得。
举个例子:HTML 5 预计要到 2022 年才会成为正式标准。但规范草案已经可用,并且该标准的主要功能早在该标准正式发布之前就将开始出现在浏览器中。
Because making standards is hard. It's nearly impossible to get enough people to agree on anything to make it an official standard, let alone something as complex as regex. Defacto standards are much easier to come by.
Case in point: HTML 5 is not expected to become an official standard until the year 2022. But the draft specification is already available, and major features of the standard will begin appearing in browsers long before the standard is official.
我对此进行了研究,但找不到任何具体的东西。我的猜测是,这是因为正则表达式通常是一种可以在ON工具上工作的工具,因此它必然具有特定于平台和工具的扩展。
例如,在 Visual Studio 中,您可以使用正则表达式来查找和替换源代码中的字符串。他们添加了诸如 :i 之类的东西来匹配标识符。在其他平台上的其他工具中,标识符可能不是一个适用的概念。事实上,也许其他平台和工具保留冒号字符来转义表达式。
诸如此类的差异使得这一点特别难以标准化。
I have researched this and could not find anything concrete. My guess is that it's because regex is so often a tool that works ON tools and therefore it's going to necessarily have platform- and tool- specific extensions.
For example, in Visual Studio, you can use regular expressions to find and replace strings in your source code. They've added stuff like :i to match an identifier. On other platforms in other tools, identifiers may not be an applicable concept. In fact, perhaps other platforms and tools reserve the colon character to escape the expression.
Differences like that make this one particularly hard to standardize.
Perl 是第一个(或者说几乎接近第一个),虽然它是 Perl 并且我们都喜欢它,但它已经过时了,有些人觉得它需要更多的改进(即功能)。这就是新类型出现的地方。
它们开始规范化,.NET 中使用的正则表达式与其他语言中使用的正则表达式非常相似,我认为人们正在慢慢开始统一,但有些人习惯了他们的 Perl 方式,不想改变。
Perl was first (or danm near close to first), and while it's perl and we all love it, it's old some people felt it needed more polish (i.e. features). This is where new types came in.
They're starting to nomalize, the regex used in .NET is very similar to the regex used in other languages, i think slowly people are starting to unify, but some are used to thier perl ways and dont want to change.
只是猜测:从来没有一个流行到足以被视为规范标准的版本,并且没有标准实现。每个来重新实现它的人对于如何使它“更好”都有自己的想法。
Just a guess: there was never a version popular enough to be considered the canonical standard, and there was no standard implementation. Everyone who came and reimplemented it had their own ideas on how to make it "better".
风味
来自 https://www.oreilly 的 。 com/library/view/mastering-regular-expressions/0596528124/ch03.html#page_133:
在哪里可以找到
man 7 正则表达式
、man 7 标准
、POSIX 基本和扩展:help perl-patterns
和:helptwo-engines
,(如果您好奇,请尝试 < code>:help ft-posix-syntax,但不太相关)pydoc re
、pydoc posix
Flavors
from https://www.oreilly.com/library/view/mastering-regular-expressions/0596528124/ch03.html#page_133:
Where to find
man 7 regex
,man 7 standards
, POSIX basic and extended:help perl-patterns
and:help two-engines
, (if you're curious, try:help ft-posix-syntax
, but not so relevant)pydoc re
,pydoc posix
因为太多人害怕正则表达式,所以正则表达式还没有变得足够广泛,以至于没有足够多明智的人想到这个想法并能够实现它。
即使标准机构确实形成并试图统一不同的风格,太多人也会顽固地争论自己的方法,无论好坏,因为很多程序员都是这样烦人的。
Because too many people are scared of regular expressions, so they haven't become fully widespread enough for enough sensible people to both think of the idea and be in a position to implement it.
Even if a standards body did form and try to unify the different flavours, too many people would argue stubbornly towards their own approach, whether better or not, because lots of programmers are annoying like that.