PHP 中的简单 BBparser 可让您替换标签之外的内容
我正在尝试解析表示源代码的字符串,如下所示:
[code lang="html"]
<div>stuff</div>
[/code]
<div>stuff</div>
正如您从我之前的 20 个问题中看到的,我尝试使用 PHP 的正则表达式函数来完成此操作,但遇到了很多问题,特别是当字符串非常大时。 ..
你们知道我可以用 PHP 编写的 BB 解析器类来代替正则表达式吗?
我需要它做的是:
- 能够使用 html 实体转换
[code]
标签内的所有内容 - 能够仅对外部内容运行某种过滤器(我的回调函数)
[code]
标签的
感谢您
编辑: 我最终使用了这个:
将所有
和
标记转换为 [pre] 和 [code] :
str_replace(array('
', '
', '', '
'), array('[pre]', ' [/pre]', '[code]', '[/code]'), $content);获取两者之间的内容[code]..[/code] 和 [pre]...[/pre] 并进行 html 实体转换
preg_replace_callback('/(.?)\[(pre|code)\b(.*?)(?:(\/))?\](?:(.+?)\[\ /\2\])?(.?)/s', 'self::specialchars', $content);
(我从 WordPress 短代码函数中窃取了这个模式:)
将实体转换后的内容存储在临时数组变量中,并将
$content
中的内容替换为唯一 ID我现在可以在
$content
上安全地运行我的过滤器,因为其中没有代码它,只是 ID(此过滤器对整个文本执行 strip_tags 并将http://blabla.com
之类的内容转换为链接)- < p>将
$content
中的唯一 ID 替换为数组变量中转换后的代码块
你觉得可以吗?
I'm trying to parse strings that represent source code, something like this:
[code lang="html"]
<div>stuff</div>
[/code]
<div>stuff</div>
As you can see from my previous 20 questions, I tried to do it with PHP's regex functions, but ran into many problems, especially when the string is very big...
Do you guys know a BB parser class written in PHP that I can use for this, instead of regexes?
What I need it to do is:
- be able to convert all content from within
[code]
tags with html entities - be able to run some kind of a filter (a callback function of mine) only on content outside of the
[code]
tags
thank you
edit:
I ended up using this:
convert all
<pre>
and<code>
tags to [pre] and [code]:str_replace(array('<pre>', '</pre>', '<code>', '</code>'), array('[pre]', '[/pre]', '[code]', '[/code]'), $content);
get contents from between [code]..[/code] and [pre]...[/pre] and do the html entity conversion
preg_replace_callback('/(.?)\[(pre|code)\b(.*?)(?:(\/))?\](?:(.+?)\[\/\2\])?(.?)/s', 'self::specialchars', $content);
(i stole this pattern from wordpress shortcode functions :)
store the entity converted content in a temporary array variable, and replace the one from
$content
with a unique IDI can now safely run my filter on
$content
, because there's no code in it, just the ID (this filter does a strip_tags on the entire text and converts stuff likehttp://blabla.com
to links)replace the unique IDs from
$content
with the converted code blocks from the array variable
do you think it's ok?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
HTML Purifier http://htmlpurifier.org/
但是您面临着与之前 20 个问题相同的问题。
HTML Purifier http://htmlpurifier.org/
But you are facing same issues just like in your 20 previous questions.
有 bbcode pecl扩展。
还有 pear's html_bbcodeparser ,尽管我不能为有效的效率提供了好处。
其他地方也有一些,但我认为它们都很僵硬。
我不相信那些人要做的事情的 ,关于标签内容的回调(然后@webarto是 em 完全正确正确HTMLPurifier是处理内容时使用的正确工具)。您可能必须在这里写自己的书。 我以前写过关于我做同样的经历的文章您可能会很有帮助。
There's the BBCode PECL extension, but you'd need to compile it.
There's also PEAR's HTML_BBCodeParser, though I can't vouch for how effective it is.
There are also a few elsewhere, but I think they're all pretty rigid.
I don't believe that either of those do what you're looking for, with regard to having a callback for tag contents (and then @webarto is totally correct in that HTMLPurifier is the right tool to use when processing the contents). You might have to write your own here. I've previously written about my experiences doing the same that you might find helpful.