RegEx 从字符串中去除 BBCode 标签

发布于 2024-10-06 07:28:39 字数 1719 浏览 14 评论 0原文

我正在开发一个使用 JQuery MarkItUp 的功能！编辑器作为 BBCode 编辑器。我只允许使用一小部分 BBCode，包括以下内容：

[b]
[i]
[quote]
[quote=Mr Incredible]
[img]
[url]
[youtube]

我有一个使用编辑器的 1,500 个字符的“描述”字段，但我还计划存储描述的 150 个字符摘要所有 BBCode 都被删除。

我目前正在使用一个简单的正则表达式在 C# 中执行此操作。它基本上破坏了字符串中嵌入的 BBCode，但它留下了很多“嘈杂的内容”，例如我也想从摘要中删除的 [img] URL 或 [youtube] 视频 ID。

这是我当前的正则表达式：

  public static String StripBBCode(string bbCode)
  {
     string r = Regex.Replace(bbCode,
     @"\[(.*?)\]",
     String.Empty, RegexOptions.IgnoreCase);

     // Finally, replace all newlines with a space
     r = Regex.Replace(r,
     @"(\r\n|\n\r|\r|\n)+",
     @" ", RegexOptions.IgnoreCase);

     return r;
  }

如果我通过此函数运行以下字符串，我将得到如下所示的结果：

源

This is [b]bold[/b]. This is [i]italic[/i].

Here is an image:
[img]http://www.phatmac.com/Pics/Movies/Incredibles.jpg[/img]

Here is a link to [url=http://espn.go.com]ESPN[/url].

Here is a YouTube video:

[youtube]WJ0UkZ3W4FA[/youtube]

结果

这是粗体。这是斜体。这是一张图片：http://www.phatmac.com/Pics/Movies/Incredibles。 jpg 这是 ESPN 的链接。这是 YouTube 视频：WJ0UkZ3W4FA

这是我想要返回的内容

这是粗体。这是斜体。这是一张图片：这是 ESPN 的链接。这是 YouTube 视频：

如何修改 StripBBCode() 函数来实现此目的？

已编辑

下面第一个答案中大卫的建议是正确的。

这是我现在使用的：

 string r = Regex.Replace(s,
    @"\[youtube\].*\[\/youtube\]",
    String.Empty, RegexOptions.IgnoreCase);

 r = Regex.Replace(r,
    @"\[img\].*\[\/img\]",
    String.Empty, RegexOptions.IgnoreCase);

原文

I'm working on a feature that uses the JQuery MarkItUp! editor as a BBCode editor. I'm only allowing a small subset of BBCodes including the following:

[b]
[i]
[quote]
[quote=Mr Incredible]
[img]
[url]
[youtube]

I have a 1,500 character "Description" field that uses the editor, but I'm also planning to store a 150 character digest of the description with all of the BBCode stripped out.

I'm currently using a simple RegEx to do this in C#. It basically nukes embedded BBCodes in a string, but it leaves behind a lot of "noisy content" like the [img] URL or the [youtube] video ID that I'd also like to remove from the digest.

Here's my current RegEx:

  public static String StripBBCode(string bbCode)
  {
     string r = Regex.Replace(bbCode,
     @"\[(.*?)\]",
     String.Empty, RegexOptions.IgnoreCase);

     // Finally, replace all newlines with a space
     r = Regex.Replace(r,
     @"(\r\n|\n\r|\r|\n)+",
     @" ", RegexOptions.IgnoreCase);

     return r;
  }

If I run the following string through this function, I get the result shown below:

source

This is [b]bold[/b]. This is [i]italic[/i].

Here is an image:
[img]http://www.phatmac.com/Pics/Movies/Incredibles.jpg[/img]

Here is a link to [url=http://espn.go.com]ESPN[/url].

Here is a YouTube video:

[youtube]WJ0UkZ3W4FA[/youtube]

result

This is bold. This is italic. Here is an image: http://www.phatmac.com/Pics/Movies/Incredibles.jpg Here is a link to ESPN. Here is a YouTube video: WJ0UkZ3W4FA

Here's what I want to get back

This is bold. This is italic. Here is an image: Here is a link to ESPN. Here is a YouTube video:

How can I modify my StripBBCode() function to achieve this?

EDITED

The suggestion from David below in the first answer was correct.

Here's what I'm using now:

 string r = Regex.Replace(s,
    @"\[youtube\].*\[\/youtube\]",
    String.Empty, RegexOptions.IgnoreCase);

 r = Regex.Replace(r,
    @"\[img\].*\[\/img\]",
    String.Empty, RegexOptions.IgnoreCase);

分享到QQ

分享到微博