PHP PCRE - 正确的嵌套标签行为
我想编写一个简单的(由一个 preg_replace 调用组成)论坛解析器,但我遇到了嵌套标签的问题。
例如,如果有人引用某人引用某人的话,我就无法实现正确的行为。
当有:
[quote=Tom]
[quote=Jerry]
Lorem
[/quote]
Ipsum
[/quote]
Dolor.
我想要这样的东西:
<blockquote>
<p><strong>Tom wrote</strong></p>
<blockquote>
<p><strong>Jerry wrote:</strong></p>
<p>Lorem</p>
</blockquote>
Ipsum
</blockquote>
Dolor.
我有这个代码:
preg_replace('~\[quote (.+)\](.+)\[/quote\]~is', '<blockquote><p><strong>$1</strong> wrote:</p><p>$2</p></blockquote>', $value);
这个版本是贪婪的。如果我有两个单独的 [quote]
块,则正则表达式会将第一个 [quote]
和第二个 [/quote]
之间的所有文本包装起来。
如果我添加 U
修饰符,那就太不贪心了 - 第一个 [quote]
标记与第一个(嵌套且不相关)[/quote]标签。
感谢您的帮助!
I want to write symple (consisting of one preg_replace call) forum parser and I run into problems with nested tags.
E.g. if someone is quoting someone quoting someone, I cannot achieve correct behaviour.
When having:
[quote=Tom]
[quote=Jerry]
Lorem
[/quote]
Ipsum
[/quote]
Dolor.
I want something like this:
<blockquote>
<p><strong>Tom wrote</strong></p>
<blockquote>
<p><strong>Jerry wrote:</strong></p>
<p>Lorem</p>
</blockquote>
Ipsum
</blockquote>
Dolor.
I have this code:
preg_replace('~\[quote (.+)\](.+)\[/quote\]~is', '<blockquote><p><strong>$1</strong> wrote:</p><p>$2</p></blockquote>', $value);
This version is greedy. If I have two separate [quote]
blocks, the regex wraps all the text between the first [quote]
and the second [/quote]
.
If I add the U
modifier, it's too ungreedy - the first [quote]
tag is paired with the first (nested and irrelevant) [/quote]
tag.
Thanks for any help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
有 PEAR HTML_BBCodeParser 包,PHP 还有一个用于解析这样的代码的本机扩展,请检查这个例子: http://www.php.net/manual/en/ function.bbcode-create.php
There is the PEAR HTML_BBCodeParser Package and also PHP has a native extension for parsing code like this, check this example: http://www.php.net/manual/en/function.bbcode-create.php
不要为此使用正则表达式。使用提供的官方 PECL 扩展:
示例(从文档中提取):
完整的文档。
Don't use a regular expression for this. Use the official PECL extension provided:
Example (lifted from the docs):
The full docs.
在递归正则表达式的帮助下:
该模式仅匹配最外层的引号块,并且回调函数
replace_quotes_callback
通过递归调用replace_quotes
来替换自身内部的引号。With some help of recursive regular expressions:
The pattern matches only outermost quote blocks, and the callback function
replace_quotes_callback
replace quotes inside itself by recursively callreplace_quotes
.