BBCode正则表达式解析问题

发布于 2024-10-08 18:23:20 字数 1010 浏览 5 评论 0原文

所以我有一些将 BBCode 转换为 HTML 的 Javascript,这似乎工作得很好,但我有一个问题。

这是有效的表达式之一,我用它来将 BB 标签 [b] 和 [/b] 转换为 。和

str = str.replace(/\[b\]((\s|\S)*?)\[\/b\]/ig, '<b>$1</b>');

这也会转换连续的标签。例如

[b]str1[/b][b]str2[/b]

变为

str1 str2

这很好;这就是我想要它做的。但是,当我尝试匹配引号标签时,例如

str = str.replace(/\[quote\]((\s|\S)*?)\[\/quote\]/ig, '<span class="quotebox">$1</span>');

str 是

[quote]巢穴级别 1[quote]巢穴级别 2[/quote][/quote]

只有第一个标签被匹配和转换,所以我最终得到的输出看起来像

巢穴 1 级 [quote]巢穴2级

[/quote]

最后一个引号标签位于引号框之外 - 它应该嵌套在另一个引号内。帮助?

另外,如果相关的话,quotebox 类如下

.quotebox{
边框:1px 内嵌黑色;
显示:块;
下边距:5px;
顶部边距:5px;
内边距:2px 2px 2px 4px;
}

So I have some Javascript that converts BBCode to HTML, which seems to work well, but I have a problem.

Here is one of the expressions that works which I use to convert the BB tags [b] and [/b] to <b> and </b>.

str = str.replace(/\[b\]((\s|\S)*?)\[\/b\]/ig, '<b>$1</b>');

This also converts consecutive tags. For example

[b]str1[/b] [b]str2[/b]

becomes

str1 str2

Which is good; that's what I want it to do. However, when I try to match quote tags like so

str = str.replace(/\[quote\]((\s|\S)*?)\[\/quote\]/ig, '<span class="quotebox">$1</span>');

where str is

[quote]Nest level 1[quote]Nest level 2[/quote][/quote]

only the first tag is matched and converted, so I'll end up getting output looking like

Nest level 1
[quote]Nest level 2

[/quote]

With the last quote tag outside of the quote box - it should be nested within the other one. Help?

Also, if it's relevant, the quotebox class is as follows

.quotebox{
border:1px inset black;
display:block;
margin-bottom:5px;
margin-top:5px;
padding:2px 2px 2px 4px;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

时光瘦了 2024-10-15 18:23:21

您刚刚被这样一个事实所困扰:(真正的)正则表达式只能描述常规语言 。正则表达式无法描述的显着特征是递归。典型的例子是 Dyck 语言,该语言由所有平衡括号字符串组成,例如 ()(())()((()))(((((()))))等等。这是非常规的,本质上是您要解决的问题:匹配适当嵌套的 [b][/b][quote][/quote ] 等。换句话说,用正则表达式根本不可能做你想做的事。然而,你可能已经注意到我说的是“真实”。 JavaScript 等语言中提供的正则表达式并不是真正的正则表达式;而是正则表达式。他们有额外的权力,大部分(完全?)源于反向引用。例如,正则表达式 (.*)\1 描述了一种非常规语言。但即便如此,我认为您仍无法匹配 Dyck 语言。1

那么,解决方案是什么?找到一个用 JavaScript 编写的现有 BBCode 到 HTML 转换器!这肯定会让您的生活变得最简单。不幸的是,我一时想不起来,因为我不做太多 JavaScript 编程。 这个 StackOverflow 问题表明这样的事情可能不存在,在这种情况下你唯一的选择就是推出你自己的解析器。当然,更复杂,但肯定是可行的。在我的脑海中(我不是专家),您可能想要扫描字符串直到找到标签。 (识别标签对于正则表达式来说可能是一项很好的任务。)如果它是一个开始标签,请将其压入堆栈。如果它是结束标记,请弹出堆栈,确保结束标记与开始标记匹配,并将您目前看到的字符串包装在适当的 HTML 中。这可能行不通,也可能太复杂了——这只是我快速思考问题后的2美分。


1:我不是100%确定,但我见过的唯一一个正则表达式匹配平衡括号的例子是在Perl中,并且它嵌入了Perl代码,这是JavaScript无法做到的。不管怎样,这是不可取的——你正在尝试使用一个会让你的任务变得更加复杂的工具。)

You've just been bitten by the fact that (real) regular expressions can only describe regular languages. The salient feature regular expressions cannot describe is recursion. The canonical example of this is the Dyck language, the language which consists of all strings of balanced parentheses, such as (), (())()((())), ((((())))), etc. This is non-regular, and is essentially the problem you're trying to solve: matching appropriately-nested [b][/b]s, [quote][/quote]s, and the like. In other words, it's literally impossible to do what you want with a regular expression. However, you may have noticed that I said "real". The regexes provided in languages like JavaScript aren't true regular expressions; they have extra power, mostly (entirely?) stemming from backreferences. The regex (.*)\1, for instance, describes a non-regular language. Even given this, though, I don't think you can match the Dyck language.1

So, then, what's the solution? Find a pre-existing BBCode to HTML converter written in JavaScript! This is definitely going to make your life the simplest. I don't know of one off the top of my head, unfortunately, since I don't do much JavaScript programming. This StackOverflow question indicates that such a thing might not exist, in which case your only option is to roll your own parser. More complicated, of course, but certainly doable. Off the top of my head (I am not an expert), you'd probably want to scan through the string until you find a tag. (Recognizing a tag may well be a good task for a regular expression.) If it's an opening tag, push that on a stack. If it's a closing tag, pop the stack, make sure that the closing tag matches the opening tag, and wrap the string you've seen so far in the appropriate HTML. This might not work, or it might be too complicated—it's just my 2¢ after thinking about the problem quickly.


1: I'm not 100% sure, but the only example of a regex matching balanced parentheses I've ever seen was in Perl, and it embedded Perl code, which JavaScript can't do. Either way, it's inadvisable—you're trying to use a tool which will make your task much more complicated.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文