处理 Javascript RegEx 子匹配

发布于 2024-07-04 06:29:02 字数 614 浏览 4 评论 0原文

我正在尝试编写一些 JavaScript RegEx 以用真实的 html 标签替换用户输入的标签,因此 [b] 将变为 等等。 我使用的正则表达式看起来像这样,

var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;

使用以下 JavaScript,

s.replace(exptags,"<$1>$2</$1>");

这对于单个嵌套标签来说效果很好,例如:

[b]hello[/b] [u]world[/u]

但如果标签彼此嵌套,它只会匹配外部标签,例如,

[b]foo [u]to the[/u] bar[/b]

这只会匹配 b 标签。 我怎样才能解决这个问题? 我应该循环直到起始字符串与结果相同吗? 我有一种感觉 ((.){1,}?) 模式也错了?

谢谢

I am trying to write some JavaScript RegEx to replace user inputed tags with real html tags, so [b] will become <b> and so forth. the RegEx I am using looks like so

var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;

with the following JavaScript

s.replace(exptags,"<$1>$2</$1>");

this works fine for single nested tags, for example:

[b]hello[/b] [u]world[/u]

but if the tags are nested inside each other it will only match the outer tags, for example

[b]foo [u]to the[/u] bar[/b]

this will only match the b tags. how can I fix this? should i just loop until the starting string is the same as the outcome? I have a feeling that the ((.){1,}?) patten is wrong also?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

花期渐远 2024-07-11 06:29:04

同意 Richard Szalay 的观点,但他的正则表达式没有得到正确的引用:

var exptags = /\[(b|u|i|s|center|code)](.*)\[\/\1]/ig;

更干净。 请注意,我还将 .+? 更改为 .*.+? 有两个问题:

  1. 你不会匹配 [u][/u],因为它们之间至少没有一个字符 (+),
  2. 非贪婪匹配不会匹配不能很好地处理嵌套在其自身内部的相同标签(?)

Agree with Richard Szalay, but his regex didn't get quoted right:

var exptags = /\[(b|u|i|s|center|code)](.*)\[\/\1]/ig;

is cleaner. Note that I also change .+? to .*. There are two problems with .+?:

  1. you won't match [u][/u], since there isn't at least one character between them (+)
  2. a non-greedy match won't deal as nicely with the same tag nested inside itself (?)
黒涩兲箜 2024-07-11 06:29:04

你说的内部模式很麻烦是对的。

((.){1,}?)

也就是说,至少进行一次捕获的比赛,然后捕获整个事件。 标签内的每个字符都将作为一个组被捕获。

您还可以在不需要时捕获结束元素名称,并在暗示时使用 {1} 。 以下是清理版本:

/\[(b|u|i|s|center|code)](.+?)\[\/\1]/ig

不确定其他问题。

You are right about the inner pattern being troublesome.

((.){1,}?)

That is doing a captured match at least once and then the whole thing is captured. Every character inside your tag will be captured as a group.

You are also capturing your closing element name when you don't need it and are using {1} when that is implied. Below is a cleanup up version:

/\[(b|u|i|s|center|code)](.+?)\[\/\1]/ig

Not sure about the other problem.

审判长 2024-07-11 06:29:04

怎么样:

tagreg=/\[(.?)?(b|u|i|s|center|code)\]/gi;
"[b][i]helloworld[/i][/b]".replace(tagreg, "<$1$2>");
"[b]helloworld[/b]".replace(tagreg, "<$1$2>");

对我来说,上面的结果是:

<b><i>helloworld</i></b>
<b>helloworld</b>

这似乎可以满足您的要求,并且具有只需要一次传递的优点。

免责声明:我不经常用 JS 编写代码,所以如果我犯了任何错误,请随时指出:-)

How about:

tagreg=/\[(.?)?(b|u|i|s|center|code)\]/gi;
"[b][i]helloworld[/i][/b]".replace(tagreg, "<$1$2>");
"[b]helloworld[/b]".replace(tagreg, "<$1$2>");

For me the above produces:

<b><i>helloworld</i></b>
<b>helloworld</b>

This appears to do what you want, and has the advantage of needing only a single pass.

Disclaimer: I don't code often in JS, so if I made any mistakes please feel free to point them out :-)

楠木可依 2024-07-11 06:29:04

是的,你必须循环。 或者,由于您的标签看起来非常像 HTML 标签,因此您可以将 [b] 替换为 ,将 [/b] 替换为 分开。 (.){1,}? 与 (.*?) 相同 - 即任何符号,最小可能的序列长度。

更新:感谢 MrP,(.){1,}? 是 (.)+?,我的错。

Yes, you will have to loop. Alternatively since your tags looks so much like HTML ones you could replace [b] for <b> and [/b] for </b> separately. (.){1,}? is the same as (.*?) - that is, any symbols, least possible sequence length.

Updated: Thanks to MrP, (.){1,}? is (.)+?, my bad.

So尛奶瓶 2024-07-11 06:29:03

AFAIK 你不能用正则表达式来表达递归。

不过,您可以使用 .NET 的 System.Text.RegularExpressions 使用平衡匹配来做到这一点。 在此处查看更多信息: http://blogs.msdn.com/ bclteam/archive/2005/03/15/396452.aspx

如果您使用.NET,您可能可以通过回调来实现您需要的功能。
如果没有,您可能必须推出自己的小型 JavaScript 解析器。

话又说回来,如果您有能力访问服务器,则可以使用完整的解析器。 :)

你需要这个做什么? 如果不是为了预览,我强烈建议在服务器端进行处理。

AFAIK you can't express recursion with regular expressions.

You can however do that with .NET's System.Text.RegularExpressions using balanced matching. See more here: http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

If you're using .NET you can probably implement what you need with a callback.
If not, you may have to roll your own little javascript parser.

Then again, if you can afford to hit the server you can use the full parser. :)

What do you need this for, anyway? If it is for anything other than a preview I highly recommend doing the processing server-side.

能否归途做我良人 2024-07-11 06:29:03

您可以重复应用正则表达式,直到它不再匹配。 这会做一些奇怪的事情,比如 "[b][b]foo[/b][/b]" => "[b]foo[/b]" => “foo”,但据我所知,最终结果仍然是一个带有匹配(尽管不一定正确嵌套)标签的合理字符串。

或者,如果您想“正确”地做到这一点,只需编写一个简单的递归下降解析器即可。 尽管人们可能期望“[b]foo[u]bar[/b]baz[/u]”能够工作,但解析器很难识别它。

You could just repeatedly apply the regexp until it no longer matches. That would do odd things like "[b][b]foo[/b][/b]" => "<b>[b]foo</b>[/b]" => "<b><b>foo</b></b>", but as far as I can see the end result will still be a sensible string with matching (though not necessarily properly nested) tags.

Or if you want to do it 'right', just write a simple recursive descent parser. Though people might expect "[b]foo[u]bar[/b]baz[/u]" to work, which is tricky to recognise with a parser.

伏妖词 2024-07-11 06:29:03

嵌套块没有被替换的原因是因为 [b] 的匹配将位置放置在 [/b] 之后。 因此, ((.){1,}?) 匹配的所有内容都会被忽略。

可以在服务器端编写递归解析器 - Perl 使用 qr// Ruby 可能也有类似的东西。

不过,您不一定需要真正的递归。 您可以使用相对简单的循环来等效地处理字符串:

var s = '[b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]';
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;

while (s.match(exptags)) {
   s = s.replace(exptags, "<$1>$2</$1>");
}

document.writeln('<div>' + s + '</div>'); // after

在本例中,它将进行 2 次传递:

0: [b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]
1: <b>hello</b> <u>world</u> <b>foo [u]to the[/u] bar</b>
2: <b>hello</b> <u>world</u> <b>foo <u>to the</u> bar</b>

此外,还有一些清理正则表达式的建议:

var exptags = /\[(b|u|i|s|center|code)\](.+?)\[\/(\1)\]/ig;
  • 当不存在其他计数说明符时,假定 {1}
  • {1,}可以缩写为 +

The reason the nested block doesn't get replaced is because the match, for [b], places the position after [/b]. Thus, everything that ((.){1,}?) matches is then ignored.

It is possible to write a recursive parser in server-side -- Perl uses qr// and Ruby probably has something similar.

Though, you don't necessarily need true recursive. You can use a relatively simple loop to handle the string equivalently:

var s = '[b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]';
var exptags = /\[(b|u|i|s|center|code){1}]((.){1,}?)\[\/(\1){1}]/ig;

while (s.match(exptags)) {
   s = s.replace(exptags, "<$1>$2</$1>");
}

document.writeln('<div>' + s + '</div>'); // after

In this case, it'll make 2 passes:

0: [b]hello[/b] [u]world[/u] [b]foo [u]to the[/u] bar[/b]
1: <b>hello</b> <u>world</u> <b>foo [u]to the[/u] bar</b>
2: <b>hello</b> <u>world</u> <b>foo <u>to the</u> bar</b>

Also, a few suggestions for cleaning up the RegEx:

var exptags = /\[(b|u|i|s|center|code)\](.+?)\[\/(\1)\]/ig;
  • {1} is assumed when no other count specifiers exist
  • {1,} can be shortened to +
撑一把青伞 2024-07-11 06:29:02

最简单的解决方案是替换所有标签,无论它们是否关闭,然后让 .innerHTML 判断它们是否匹配,这样会更有弹性。

var tagreg = /\[(\/?)(b|u|i|s|center|code)]/ig
div.innerHTML="[b][i]helloworld[/b]".replace(tagreg, "<$1$2>") //no closing i
//div.inerHTML=="<b><i>helloworld</i></b>"

The easiest solution would be to to replace all the tags, whether they are closed or not and let .innerHTML work out if they are matched or not it will much more resilient that way..

var tagreg = /\[(\/?)(b|u|i|s|center|code)]/ig
div.innerHTML="[b][i]helloworld[/b]".replace(tagreg, "<$1$2>") //no closing i
//div.inerHTML=="<b><i>helloworld</i></b>"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文