PCRE:查找代码块的匹配大括号

发布于 2024-08-22 20:00:42 字数 588 浏览 7 评论 0原文

PCRE 正则表达式有没有办法计算它遇到的字符出现次数 (n),并在找到另一个字符出现 n 次后停止搜索(特别是 {} )。

这是为了获取代码块(其中可能有也可能没有嵌套代码块)。

如果更简单的话,输入将是一个单行字符串,除了大括号之外的唯一字符是数字、冒号和逗号。在尝试提取代码块之前,输入必须通过以下条件:

$regex = '%^(\\d|\\:|\\{|\\}|,)*$%';

所有大括号都将具有匹配对,并且正确嵌套。

我想知道在开始编写脚本来检查字符串中的每个字符并计算大括号的每次出现之前是否可以实现这一点。正则表达式对内存更加友好,因为这些字符串的大小可能有几千字节!

谢谢,姆尼兹。

解决方案

PCRE:懒惰和贪婪同时(所有格量词)

Is there a way for PCRE regular expressions to count how many occurrences of a character it encounters (n), and to stop searching after it has found n occurrences of another character (specifically { and }).

This is to grab code blocks (which may or may not have code blocks nested inside them).

If it makes it simpler, the input will be a single-line string, with the only characters other than braces are digits, colons and commas. The input must pass the following criteria before code blocks are even attempted to be extracted:

$regex = '%^(\\d|\\:|\\{|\\}|,)*$%';

All braces will have a matching pair, and nested correctly.

I would like to know if this can be achieved before I start writing a script to check every character in the string and count each occurrence of a brace. Regular expressions would be much more memory friendly as these strings can be several kilobytes in size!

Thanks, mniz.

Solution

PCRE: Lazy and Greedy at the same time (Possessive Quantifiers)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

土豪我们做朋友吧 2024-08-29 20:00:42

pcre 有递归模式,所以你可以做类似的

$code_is_valid = preg_match('~^({ ( (?>[^{}]+) | (?1) )* })$~x', '{' . $code .'}');

事情事情,我不认为这会比简单计数器更快或更少消耗内存,特别是在大字符串上。

这是如何在字符串中查找所有(有效)代码块

preg_match_all('~ { ( (?>[^{}]+) | (?R) )* } ~x', $input, $blocks);
print_r($blocks);

pcre has recursive patterns, so you can do something like this

$code_is_valid = preg_match('~^({ ( (?>[^{}]+) | (?1) )* })$~x', '{' . $code .'}');

the other thing, i don't think this will be faster or less memory consuming than simple counter, especially on large strings.

and this is how to find all (valid) codeblocks in a string

preg_match_all('~ { ( (?>[^{}]+) | (?R) )* } ~x', $input, $blocks);
print_r($blocks);
娇纵 2024-08-29 20:00:42

这正是正则表达式擅长的地方。这是一个经典的例子。

您应该逐个字符地迭代字符串,并保留嵌套级别的计数。

This is exactly what regular expressions are not good for. It's the classic example.

You should just iterate over the string character by character, and keep a count of the nesting level.

灼痛 2024-08-29 20:00:42
$regex='%^(\\d|\\:|\\{|\\}|,){0,25)$%';
preg_match($regex,$target,$matches);

其中: 第一行 25 表示最大出现次数。然后检查:

$n=count($matches);
$regex='%^(\\d|\\:|\\{|\\}|,){0,25)$%';
preg_match($regex,$target,$matches);

where: 25 on first line indicates maximum number of occurrences. then check:

$n=count($matches);
暮色兮凉城 2024-08-29 20:00:42

这是不可能的,因为您描述的语言不是常规语言

请改用解析器。

It is impossible since the language you are describing is not a regular language.

Use a parser instead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文