为什么没有正则表达式标准?

发布于 2024-09-01 04:50:04 字数 59 浏览 2 评论 0原文

我知道 Perl 正则表达式是一种事实上的次要标准,但为什么没有人提出一套通用的标准符号、语法和行为呢?

I know there is the perl regex that is sort of a minor de facto standard, but why hasn't anyone come up with a universal set of standard symbols, syntax and behaviors?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

醉殇 2024-09-08 04:50:04

IEEE 制定了与 POSIX 工作相关的标准 。真正的问题是“为什么不是每个人都遵循它”?答案可能是,在贪婪匹配等方面,它并不像 PCRE(Perl 兼容正则表达式)那么复杂。

There is a standard by IEEE associated with the POSIX effort. The real question is "why doesn't everyone follow it"? The answer is probably that it is not quite as complex as PCRE (Perl Compatible Regular Expression) with respect to greedy matching and what not.

甜心 2024-09-08 04:50:04

实际上,有一个正则表达式标准(POSIX),但它很糟糕。因此,人们扩展他们的 RE 引擎来满足他们的应用程序的需求。 PCRE(Perl 兼容正则表达式)是与 Perl 的 RE 引擎兼容的正则表达式的伪标准。这一点特别重要,因为您可以将 Perl 的引擎嵌入到其他应用程序中。

Actually, there is a regular expression standard (POSIX), but it's crappy. So people extend their RE engine to fit the needs of their application. PCRE (Perl-compatible regular expressions) is a pseudo-standard for regular expressions that are compatible with Perl's RE engine. This is particularly relevant because you can embed Perl's engine into other applications.

娇柔作态 2024-09-08 04:50:04

因为制定标准很难。几乎不可能让足够多的人就任何事情达成一致,使其成为官方标准,更不用说像正则表达式这样复杂的事情了。事实上的标准更容易获得。

举个例子:HTML 5 预计要到 2022 年才会成为正式标准。但规范草案已经可用,并且该标准的主要功能早在该标准正式发布之前就将开始出现在浏览器中。

Because making standards is hard. It's nearly impossible to get enough people to agree on anything to make it an official standard, let alone something as complex as regex. Defacto standards are much easier to come by.

Case in point: HTML 5 is not expected to become an official standard until the year 2022. But the draft specification is already available, and major features of the standard will begin appearing in browsers long before the standard is official.

北陌 2024-09-08 04:50:04

我对此进行了研究,但找不到任何具体的东西。我的猜测是,这是因为正则表达式通常是一种可以在ON工具上工作的工具,因此它必然具有特定于平台和工具的扩展。

例如,在 Visual Studio 中,您可以使用正则表达式来查找和替换源代码中的字符串。他们添加了诸如 :i 之类的东西来匹配标识符。在其他平台上的其他工具中,标识符可能不是一个适用的概念。事实上,也许其他平台和工具保留冒号字符来转义表达式。

诸如此类的差异使得这一点特别难以标准化。

I have researched this and could not find anything concrete. My guess is that it's because regex is so often a tool that works ON tools and therefore it's going to necessarily have platform- and tool- specific extensions.

For example, in Visual Studio, you can use regular expressions to find and replace strings in your source code. They've added stuff like :i to match an identifier. On other platforms in other tools, identifiers may not be an applicable concept. In fact, perhaps other platforms and tools reserve the colon character to escape the expression.

Differences like that make this one particularly hard to standardize.

谜兔 2024-09-08 04:50:04

Perl 是第一个(或者说几乎接近第一个),虽然它是 Perl 并且我们都喜欢它,但它已经过时了,有些人觉得它需要更多的改进(即功能)。这就是新类型出现的地方。

它们开始规范化,.NET 中使用的正则表达式与其他语言中使用的正则表达式非常相似,我认为人们正在慢慢开始统一,但有些人习惯了他们的 Perl 方式,不想改变。

Perl was first (or danm near close to first), and while it's perl and we all love it, it's old some people felt it needed more polish (i.e. features). This is where new types came in.

They're starting to nomalize, the regex used in .NET is very similar to the regex used in other languages, i think slowly people are starting to unify, but some are used to thier perl ways and dont want to change.

笛声青案梦长安 2024-09-08 04:50:04

只是猜测:从来没有一个流行到足以被视为规范标准的版本,并且没有标准实现。每个来重新实现它的人对于如何使它“更好”都有自己的想法。

Just a guess: there was never a version popular enough to be considered the canonical standard, and there was no standard implementation. Everyone who came and reimplemented it had their own ideas on how to make it "better".

戏蝶舞 2024-09-08 04:50:04

风味

来自 https://www.oreilly 的 。 com/library/view/mastering-regular-expressions/0596528124/ch03.html#page_133

既然您已经了解了正则表达式以及使用它们的一些不同工具,您可能会认为我们已经准备好在发现它们的任何地方深入使用它们。但即使是第一章的egrep 版本与前一章中的Perl 和Java 版本之间的简单比较也表明,正则表达式及其使用方式因工具而异。
在宿主语言或工具的上下文中查看正则表达式时,需要考虑三个主要问题:
• 支持哪些元字符及其含义。通常将正则表达式称为“风味”。
• 正则表达式如何与语言或工具“交互”,例如如何指定正则表达式操作、允许哪些操作以及它们操作哪些文本。
• 正则表达式引擎实际上如何将正则表达式应用于某些文本。语言或工具设计者用来实现正则表达式引擎的方法对人们从任何给定的正则表达式中期望的结果有很大的影响。

在哪里可以找到

  • man 7 正则表达式man 7 标准POSIX 基本和扩展
  • 对于 vim,请参阅 :help perl-patterns:helptwo-engines,(如果您好奇,请尝试 < code>:help ft-posix-syntax,但不太相关)
  • 对于 python,请参阅 pydoc repydoc posix

Flavors

from https://www.oreilly.com/library/view/mastering-regular-expressions/0596528124/ch03.html#page_133:

Now that you have a feel for regular expressions and a few diverse tools that use them, you might think we’re ready to dive into using them wherever they’re found. But even a simple comparison among the egrep versions of the first chapter and the Perl and Java in the previous chapter shows that regular expressions and the way they’re used can vary wildly from tool to tool.
When looking at regular expressions in the context of their host language or tool, there are three broad issues to consider:
• What metacharacters are supported, and their meaning. Often called the regex “flavor.”
• How regular expressions “interface” with the language or tool, such as how to specify regular-expression operations, what operations are allowed, and what text they operate on.
• How the regular-expression engine actually goes about applying a regular expression to some text. The method that the language or tool designer uses to implement the regular-expression engine has a strong influence on the results one might expect from any given regular expression.

Where to find

  • man 7 regex , man 7 standards, POSIX basic and extended
  • For vim, see :help perl-patterns and :help two-engines, (if you're curious, try :help ft-posix-syntax, but not so relevant)
  • For python, see pydoc re, pydoc posix
极致的悲 2024-09-08 04:50:04

因为太多人害怕正则表达式,所以正则表达式还没有变得足够广泛,以至于没有足够多明智的人想到这个想法并能够实现它。

即使标准机构确实形成并试图统一不同的风格,太多人也会顽固地争论自己的方法,无论好坏,因为很多程序员都是这样烦人的。

Because too many people are scared of regular expressions, so they haven't become fully widespread enough for enough sensible people to both think of the idea and be in a position to implement it.

Even if a standards body did form and try to unify the different flavours, too many people would argue stubbornly towards their own approach, whether better or not, because lots of programmers are annoying like that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文