在 php 中构建标记解析器
我在 php 中创建了一个非常简单的标记解析器。然而,它目前使用 str_replace 在标记和 html 之间切换。我怎样才能制作一个内容不受影响的“代码”框(最终将使用GeSHI)?
现在,以下标记: [code][b]Some bold text[/b][/code]
最终解析为带有 Some bold text< 的代码框。 /b>
。
我需要一些建议,哪个选项最好?
- 让它单独检查每个单词,如果它不在 [code] 框中,它应该解析
- 保持原样,让用户无法在 [code] 内发布标记。
- 专门为 HTML 标记创建另一种类型的代码框,具有 [code] autorevert any <或>到[和]。
也许还有另一种选择吗?这比我想象的要难一些...
编辑:是否值得向该解析器添加代码框类型的东西?我的意思是,我知道它会有什么用处,但是为了取得很小的结果,需要付出相当大的努力。
I have created a very simple markup parser in php. However, it currently uses str_replace to switch between markup and html. How can I make a "code" box of sorts (will eventually use GeSHI) that has the contents untouched?
Right now, the following markup: [code][b]Some bold text[/b][/code]
winds up parsing as the code box with <b>Some bold text</b>
.
I need some advice, which option is best?
- Have it check each word individually, and if it is not inside a [code] box it should parse
- Leave it as is, let users be unable to post markup inside of [code].
- Create another type of code box specifically for HTML markup, have [code] autorevert any < or > to [ and ].
Is there maybe even another option? This is a bit tougher than I thought it would be...
EDIT: Is it even worth adding a code box type thing to this parser? I mean, I see how it could be useful, but it is a rather large amount of effort for a small result.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
为什么要重新发明轮子?
已经有很多标记解析器了。
无论如何,仅仅 str_replace 没有多大帮助。你必须学习正则表达式,正如他们所说,现在你有两个问题;)
Why would you reinvent the wheel?
There's plenty of markup parsers already.
Anyway, just str_replace won't help much. You'd have to learn regular expressions and as they say, now you've got two problems ;)
您可以将其分解为多个字符串以便使用 str_replace。拆分 [code] 和 [/code] 标记上的字符串 - 将代码框保存在单独的字符串中。以某种方式记下它在原始字符串中的位置。然后对原始字符串使用 str_replace 并对代码框字符串进行任何您喜欢的解析。最后重新插入解析后的代码框并显示。
不过,请注意,将输入转换为 html 进行显示让我觉得本质上是危险的。我建议在转换为 html 重新显示之前进行大量的输入清理和检查。
You could break it down into multiple strings for the purposes of using the str_replace. Split the strings on the [code] and [/code] tags - saving the code box in a separate string. Make note of where it went in the original string somehow. Then use str_replace on the original string and do whatever parsing you like on the code box string. Finally reinsert the parsed code boxes and display.
Just a word of warning though, turning input into html for display strikes me as inherently dangerous. I'd recommend a large amount of input sanitization and checking before converting to html for redisplay.
HTML 美化器非常贴心。 http://pear.php.net/package/PHP_Beautifier 。它们还有一个装饰器类,可能会满足您的需求。
HTML beautifier is pretty sweet. http://pear.php.net/package/PHP_Beautifier . The have a decorator class as well that would probably suit your needs.
需要明确的是,您的问题分为两部分。第一部分是需要词法分析器将您的“代码”分解为“语言”的关键字。一旦有了词法分析器,就需要词法分析器。解析器是一种以逻辑(通常是递归下降方式)方式一次接受您的语言的关键字的代码。
To be clear, your problem is in two parts. The first part is the need for a lexical analyzer to break your "code" into the keywords for your "language." Once you have a lexical analyzer, you then need a parser. A parser is code that accepts the keywords for your language one-at-a-time in a logical (usually recursive-descent way) manner.