正则表达式：计算字符数

发布于 2024-11-28 11:21:38 字数 743 浏览 3 评论 0原文

我正在编写一个 PHP 脚本，用于搜索 dokuWiki 文档中的特定标题。

我当前的模式如下所示：

$pattern = "/.*=+ ". $header ." =+([^=]+)/m";
preg_match($pattern, $art->text, $m);
if (!empty($m[1])) {
   $art->text = $m[1];
} else {
   $art->text = "";
}

示例文档：

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

====== Header4 ======
Testtext4

当搜索 TestHeader 时，我的结果是：

====== TestHeader ======
Testtext

我希望该模式返回：

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

或者换句话说：我想匹配由 less = 包围的所有标头，然后我正在寻找的标题。

正则表达式可以实现这样的功能吗？

提前致谢！

原文

I'm writing a PHP-Script which searches for particular headlines inside a dokuWiki-document.

My current pattern looks like this:

$pattern = "/.*=+ ". $header ." =+([^=]+)/m";
preg_match($pattern, $art->text, $m);
if (!empty($m[1])) {
   $art->text = $m[1];
} else {
   $art->text = "";
}

A sample document:

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

====== Header4 ======
Testtext4

When searching for TestHeader my result AS-IS is:

====== TestHeader ======
Testtext

I would wish that the pattern returns:

====== TestHeader ======
Testtext

===== Header2 =====
Testtext2

==== Header3 ====
Testtext3

Or in other words: I would like to match all headers which are surrounded by less = then the header I was searching for.

Is something like this possible with regular expressions?

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

累赘 2024-12-05 11:21:38

因为我不是一个优秀的 PHP 编码员，所以我不知道是否有任何特殊的 PHP 扩展可以满足您想要的“正常”正则表达式。除此之外，正则表达式不可能解决您的问题。

如果您感兴趣的话，这背后有一些信息理论：正则表达式只能分析所谓的“常规语言”（请参阅相应的维基百科文章）。无需过多深入理论，我就可以告诉您正则表达式无法“计算”事物的直觉（至少不是在它们可以比较匹配中的两个计数的意义上）。
重述 WP 示例：无论 N 是什么，您都找不到包含 N a 后跟 N b 的字符串。

当然，这并不是数学证明您所寻找的东西是不可能的，但它应该让您了解正则表达式可以做什么和不能做什么。华泰

回复收藏 0 原文

莫言歌 2024-12-05 11:21:38

您可以通过几个步骤完成此操作：

使用您必须找到的代码来查找您要查找的标头。
计算该标头中的=。
搜索具有那么多或更少 = 的所有标头

假设您知道要在标头中查找 $n 或更少 = 字符：

$pattern = "/.*={1,$n} ". $header ." ={1,$n}([^=]+)/m";

虽然您必须使用两个正则表达式并进行一些处理，但它应该很快，第二个正则表达式将完全满足您的要求。

You can do it in a couple steps:

Use the code you've got to find the header you're looking for.
Count the ='s in that header.
Search for all headers with that many or fewer ='s

Suppose you knew you were looking for $n or fewer = characters in the header:

$pattern = "/.*={1,$n} ". $header ." ={1,$n}([^=]+)/m";

Although you'd have to use two regular expressions and do a little processing, it should be pretty quick, and the second regular expression would do exactly what you're asking for.

回复收藏 0 原文

~没有更多了~