通用语言对 PCRE(Perl 兼容正则表达式)有哪些支持?

发布于 2024-08-05 08:55:13 字数 223 浏览 3 评论 0原文

我对 PCRE(Perl 兼容正则表达式)的强大功能很感兴趣,并且想知道它们是否有可能成为所有主要语言中事实上的方法(我对 Java 感兴趣)。如果有必要,我准备使用图书馆。

我也无法在 SO 中找到一个很好的页面来描述 PCRE 的优缺点,因此如果这不存在,则将其包含在答案中可能会很有用

编辑 我对 Java 1.6 正则表达式之外的功能感兴趣,特别命名的捕获组

I am interested in the power of PCRE (Perl Compatible Regular Expressions) and wonder whether they are likely to become a de facto approach in all major languages (I am interested in Java). I am prepared to use a library if necessary.

I also could not find a good page in SO describing the pros and cons of PCRE so if this does not exist it could be useful to include this in answers

EDIT I am interested in power beyond Java 1.6 regex, particularly named capture groups

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

日记撕了你也走了 2024-08-12 08:55:13

似乎更多主流语言实际上使用自己的“类 Perl”正则表达式实现,而不是实际使用 libpcre。属于此类的语言(至少)包括 Java、JavaScript 和 Python。

Java 的 java.util.regex 库使用的语法很大程度上基于 Perl(大约版本 5.8)正则表达式,包括转义规则、\p\P Unicode 类、非贪婪和“占有”量词、反向引用、\Q..\E 引用以及一些 (?...) 构造包括非捕获组、零宽度前向/后向和非回溯组。事实上,与 libpcre 相比,Java 正则表达式与 Perl 正则表达式似乎有更多共同点。 :)

JavaScript 语言还使用源自 Perl 的正则表达式; Unicode 类、lookbehind、所有格量词和非回溯组都不存在,但我提到的 Java 的其余内容在 JS 中也存在。

Python 的正则表达式语法也基于 Perl 5,具有非贪婪量词,大多数 (?...) 构造包括非捕获组、前瞻/后瞻和条件模式。作为命名捕获组(但语法与 Perl 或 PCRE 不同)。非回溯组和“所有格”量词(据我所知)不存在,\p\P Unicode 字符类也是如此,尽管标准\d、\s\w 类可以识别 Unicode。

It seems that more mainstream languages actually use their own implementation of "Perl-like" regexes than actually use libpcre. Languages that fall into this class include (at the very least) Java, JavaScript, and Python.

Java's java.util.regex library uses a syntax that's very heavily based on Perl (approx. version 5.8) regexes, including the rules for escaping, the \p and \P Unicode classes, non-greedy and "possessive" quantifiers, backreferences, \Q..\E quoting, and several of the (?...) constructs including non-capturing groups, zero-width lookahead/behind, and non-backtracking groups. In fact Java regexes seem to have more in common with Perl regexes than libpcre does. :)

The JavaScript language also uses regexes that are derived from Perl; Unicode classes, lookbehind, possessive quantifiers, and non-backtracking groups are absent, but the rest of what I mentioned for Java is present as well in JS.

Python's regex syntax is also based on Perl 5's, with non-greedy quantifiers, most of the (?...) constructs including non-capturing groups, look-ahead/behind and conditional patterns, as well as named capture groups (but with a different syntax than either Perl or PCRE). Non-backtracking groups and 'possessive' quantifiers are (as far as I can see) absent, as are \p and \P Unicode character classes, although the standard \d, \s, and \w classes are Unicode-aware if requested.

冬天旳寂寞 2024-08-12 08:55:13

这是一个老问题,但为了更新它,Java 7 添加了命名捕获组。

This is an old question, but to update it, Java 7 added named capture groups.

不疑不惑不回忆 2024-08-12 08:55:13

我...想知道它们 [PCRE] 是否有可能成为所有主要语言中事实上的方法(我对 Java 感兴趣)。

这需要猜测,但我认为答案是“否”……就 Java 而言。我这样做是基于这样一个事实:我找不到任何值得用于 Java 的 PCRE 实现。

如果 Java 中对 PCRE 有真正需求,我预计会有更多的库。


更新

自从我写了最初的答案以来,更多的人/团体已经实现了提供(或声称提供)PCRE 兼容正则表达式的 Java 库。

显然,随着时间的推移,Java 团队可能(并且已经)向 Java 的正则表达式支持添加一些 Perl 功能。例如,Java 7 中添加了命名捕获组。

但完全 PCRE 兼容性似乎并不是 Java 团队的高优先级目标。例如:

鉴于完全兼容性可能会破坏现有 Java 应用程序的一部分,我仍然认为答案是否定的。

I ... wonder whether they [PCRE] are likely to become a de facto approach in all major languages (I am interested in Java).

This calls for speculation, but I think that the answer is "No" ... in the case of Java. I base this on the fact that I couldn't find any worthwhile PCRE implementation for Java.

If there was a real need / demand for PCRE in Java, I'd have expected there to be more libraries out there.


UPDATE

Since I wrote the original answer, more people / groups have implemented Java libraries that provide (or claim to provide) PCRE compatible regexes.

And obviously The Java team may (and has) add some Perl features to Java's regex support over time. For example, named capture groups were added in Java 7.

But full PCRE compatibility doesn't seem to be a high priority goal for the Java team. For example:

And given that full compatibility would likely break a subset of existing Java applications, I still think that the answer is No.

余罪 2024-08-12 08:55:13

尝试拆分此匹配:

(?:
  (?:'[\S\s]*?(?<!\\)') # Consume characters inside of a quoted string
  |(?:\/\*[\S\s]*?\*\/) # Consume multi-line comments
  |(?m:\/{2}[^\n]*$\n)  # Consume single-line comments
)(*SKIP)(*F)            # Fail match if any of the previous matches were found
|(?<=;)                 # Capture position right after semicolon

请务必使用“x”和“g”(如有必要)修饰符。

示例

Try doing a split off of this match:

(?:
  (?:'[\S\s]*?(?<!\\)') # Consume characters inside of a quoted string
  |(?:\/\*[\S\s]*?\*\/) # Consume multi-line comments
  |(?m:\/{2}[^\n]*$\n)  # Consume single-line comments
)(*SKIP)(*F)            # Fail match if any of the previous matches were found
|(?<=;)                 # Capture position right after semicolon

Be sure to use the 'x' and 'g' (if necessary) modifier(s).

Example

别把无礼当个性 2024-08-12 08:55:13

这听起来很像“X 是唯一正确的方法吗!?”类似的问题。 PCRE 有很多缺点,其中最明显的是它的复杂性和有用性值得怀疑。很少有任何事情都存在一种真正的方法,在正则表达式库领域,PCRE 肯定不是它。

在我看来,Perl 正则表达式完全是垃圾。一旦您获得的功能集远远超出了 POSIX 扩展正则表达式 (ERE) 提供的功能,您也可以使用 PEG 实现之类的东西。 PCRE 被如此广泛使用的唯一原因是人们只需放入库即可轻松解决问题。

This sounds a lot like a "Is X the One True Way!?" kind of question. PCRE has many shortcomings, the most obvious of which being it's complexity and questionable usefulness. Rarely does there exist a One True Way for anything, and in the realm of regexp libraries, PCRE most certainly is not it.

Perl regular expressions are utter junk in my opinion. Once you get much beyond the feature-set offered by POSIX extended regexps (ERE), you may as well use something like a PEG implementation. The only reason PCRE is used so widely used is because it's easy for people to solve a problem by just dropping in a library.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文