在 php 中构建标记解析器

发布于 2024-08-12 08:35:18 字数 524 浏览 6 评论 0原文

我在 php 中创建了一个非常简单的标记解析器。然而,它目前使用 str_replace 在标记和 html 之间切换。我怎样才能制作一个内容不受影响的“代码”框(最终将使用GeSHI)?

现在,以下标记: [code][b]Some bold text[/b][/code] 最终解析为带有 Some bold text< 的代码框。 /b>

我需要一些建议,哪个选项最好?

  • 让它单独检查每个单词,如果它不在 [code] 框中,它应该解析
  • 保持原样,让用户无法在 [code] 内发布标记。
  • 专门为 HTML 标记创建另一种类型的代码框,具有 [code] autorevert any <或>到[和]。

也许还有另一种选择吗?这比我想象的要难一些...

编辑:是否值得向该解析器添加代码框类型的东西?我的意思是,我知道它会有什么用处,但是为了取得很小的结果,需要付出相当大的努力。

I have created a very simple markup parser in php. However, it currently uses str_replace to switch between markup and html. How can I make a "code" box of sorts (will eventually use GeSHI) that has the contents untouched?

Right now, the following markup: [code][b]Some bold text[/b][/code] winds up parsing as the code box with <b>Some bold text</b>.

I need some advice, which option is best?

  • Have it check each word individually, and if it is not inside a [code] box it should parse
  • Leave it as is, let users be unable to post markup inside of [code].
  • Create another type of code box specifically for HTML markup, have [code] autorevert any < or > to [ and ].

Is there maybe even another option? This is a bit tougher than I thought it would be...

EDIT: Is it even worth adding a code box type thing to this parser? I mean, I see how it could be useful, but it is a rather large amount of effort for a small result.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

飘逸的'云 2024-08-19 08:35:18

为什么要重新发明轮子?

已经有很多标记解析器了。

无论如何,仅仅 str_replace 没有多大帮助。你必须学习正则表达式,正如他们所说,现在你有两个问题;)

Why would you reinvent the wheel?

There's plenty of markup parsers already.

Anyway, just str_replace won't help much. You'd have to learn regular expressions and as they say, now you've got two problems ;)

蝶…霜飞 2024-08-19 08:35:18

您可以将其分解为多个字符串以便使用 str_replace。拆分 [code] 和 [/code] 标记上的字符串 - 将代码框保存在单独的字符串中。以某种方式记下它在原始字符串中的位置。然后对原始字符串使用 str_replace 并对代码框字符串进行任何您喜欢的解析。最后重新插入解析后的代码框并显示。

不过,请注意,将输入转换为 html 进行显示让我觉得本质上是危险的。我建议在转换为 html 重新显示之前进行大量的输入清理和检查。

You could break it down into multiple strings for the purposes of using the str_replace. Split the strings on the [code] and [/code] tags - saving the code box in a separate string. Make note of where it went in the original string somehow. Then use str_replace on the original string and do whatever parsing you like on the code box string. Finally reinsert the parsed code boxes and display.

Just a word of warning though, turning input into html for display strikes me as inherently dangerous. I'd recommend a large amount of input sanitization and checking before converting to html for redisplay.

不喜欢何必死缠烂打 2024-08-19 08:35:18

HTML 美化器非常贴心。 http://pear.php.net/package/PHP_Beautifier 。它们还有一个装饰器类,可能会满足您的需求。

HTML beautifier is pretty sweet. http://pear.php.net/package/PHP_Beautifier . The have a decorator class as well that would probably suit your needs.

混吃等死 2024-08-19 08:35:18

需要明确的是,您的问题分为两部分。第一部分是需要词法分析器将您的“代码”分解为“语言”的关键字。一旦有了词法分析器,就需要词法分析器。解析器是一种以逻辑(通常是递归下降方式)方式一次接受您的语言的关键字的代码。

To be clear, your problem is in two parts. The first part is the need for a lexical analyzer to break your "code" into the keywords for your "language." Once you have a lexical analyzer, you then need a parser. A parser is code that accepts the keywords for your language one-at-a-time in a logical (usually recursive-descent way) manner.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文