我应该如何处理标有“警告”的正则表达式功能?

发布于 2024-10-04 08:16:02 字数 83 浏览 1 评论 0原文

我应该如何处理标有“警告”的正则表达式功能,如“(?{ code })”、“(??{ code })”或“特殊回溯控制动词”?我应该认真对待这些警告吗?

How should I handle regex-features labeled with "warning" like "(?{ code })", "(??{ code })" or "Special Backtracking Control Verbs"? How serious should I take the warnings?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

阳光下慵懒的猫 2024-10-11 08:16:02

我觉得它们会以某种方式留下来——尤其是代码转义。代码转义已经伴随我们十多年了。

它们的可怕之处——它们可以以不可预见的方式调用代码——由 use re "eval" 解决。此外,正则表达式匹配器直到 5.12 IIRC 才可重入,这可能会限制它们的实用性。

字符串评估版本 (??{ code }) 曾经是执行递归的唯一方法,但从 5.10 开始,我们有了更好的方法来做到这一点;对速度差异进行基准测试表明,在大多数情况下,评估方式要慢得多。

我主要使用块评估版本 (?{ code}) 来添加调试,其发生的粒度与 使用重新“调试” 不同。块评估版本的返回值不可用曾经让我隐约感到困扰,直到我意识到它是可用的。您只需将其用作条件模式的测试部分,例如用于测试数字是否由向右每个位置减一的数字组成的模式:

qr{
  ^ (
      ( \p{Decimal_Number} )
      (?(?= ( \d )) | $)
      (?(?{ ord $3 == 1 + ord $2 }) (?1) | $)
    ) $
}x

在我弄清楚条件之前,我会这样写这样:

qr{
   ^ (  
        ( \p{Decimal_Number} ) 
        (?= $ | (??{ chr(1+ord($2)) }) )
        (?: (?1) | $ ) 
    ) $
}x

效率低得多。

回溯控制动词是较新的。我主要使用它们来获取匹配的所有可能排列,而这只需要 (*FAIL)。我相信这是特别标记为“高度实验性”的(*ACCEPT)功能。这些是从 5.10 起才出现的。

I kinda think they’re here to stay, one way or the other — especially code escapes. Code escapes have been with us for more than a decade.

The scariness of them — that they can call code in unforeseen ways — is taken care of by use re "eval". Also, the regex matcher hasn’t been reëntrant until 5.12 IIRC, which could limit their usefulness.

The string-eval version, (??{ code }), used to be the only way to do recursion, but since 5.10 we have a much better way to do that; benchmarking the speed differences shows the eval way is way slower in most cases.

I mostly use the block-eval version, (?{ code}), for adding debugging, which happens at a different granualarity than use re "debug". It used to vaguely bother me that the return value from the block-eval version’s wasn’t usable, until I realized that it was. You just had to use it as the test part of a conditional pattern, like this pattern for testing whether a number was made up of digits that were decreasing by one each position to the right:

qr{
  ^ (
      ( \p{Decimal_Number} )
      (?(?= ( \d )) | $)
      (?(?{ ord $3 == 1 + ord $2 }) (?1) | $)
    ) $
}x

Before I figured out conditionals, I would have written that this way:

qr{
   ^ (  
        ( \p{Decimal_Number} ) 
        (?= $ | (??{ chr(1+ord($2)) }) )
        (?: (?1) | $ ) 
    ) $
}x

which is much less efficient.

The backtracking control verbs are newer. I use them mostly for getting all possible permutations of a match, and that requires only (*FAIL). I believe it is the (*ACCEPT) feature that is especially marked “highly experimental”. These have only been with us since 5.10.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文