将标记转换为 HTML 的正则表达式

发布于 2024-07-05 14:21:56 字数 459 浏览 7 评论 0 原文

如何编写正则表达式将 mark 转换为 HTML? 例如,您可以输入以下内容:

This would be *italicized* text and this would be **bold** text

然后需要将其转换为:

This would be <em>italicized</em> text and this would be <strong>bold</strong> text

与 stackoverflow 使用的 mark down 编辑控件非常相似。

澄清

就其价值而言,我正在使用 C#。 另外,这些是我想要允许的唯一真实标签/降价。 转换的文本量将少于 300 个字符左右。

How would you write a regular expression to convert mark down into HTML? For example, you would type in the following:

This would be *italicized* text and this would be **bold** text

This would then need to be converted to:

This would be <em>italicized</em> text and this would be <strong>bold</strong> text

Very similar to the mark down edit control used by stackoverflow.

Clarification

For what it is worth, I am using C#. Also, these are the only real tags/markdown that I want to allow. The amount of text being converted would be less than 300 characters or so.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

余厌 2024-07-12 14:21:56

最好的方法是找到移植到您正在使用的任何语言的 Markdown 库版本(您没有在问题中指定)。


既然您已经澄清您只希望处理 STRONG 和 EM,并且您使用的是 C#,我建议您查看 Markdown.NET 查看这些标签是如何实现的。 正如你所看到的,它实际上是两个表达式。 这是代码:

private string DoItalicsAndBold (string text)
{
    // <strong> must go first:
    text = Regex.Replace (text, @"(\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1", 
                          new MatchEvaluator (BoldEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

    // Then <em>:
    text = Regex.Replace (text, @"(\*|_) (?=\S) (.+?) (?<=\S) \1",
                          new MatchEvaluator (ItalicsEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);
    return text;
}

private string ItalicsEvaluator (Match match)
{
    return string.Format ("<em>{0}</em>", match.Groups[2].Value);
}

private string BoldEvaluator (Match match)
{
    return string.Format ("<strong>{0}</strong>", match.Groups[2].Value);
}

The best way is to find a version of the Markdown library ported to whatever language you are using (you did not specify in your question).


Now that you have clarified that you only want STRONG and EM to be processed, and that you are using C#, I recommend you take a look at Markdown.NET to see how those tags are implemented. As you can see, it is in fact two expressions. Here is the code:

private string DoItalicsAndBold (string text)
{
    // <strong> must go first:
    text = Regex.Replace (text, @"(\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1", 
                          new MatchEvaluator (BoldEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);

    // Then <em>:
    text = Regex.Replace (text, @"(\*|_) (?=\S) (.+?) (?<=\S) \1",
                          new MatchEvaluator (ItalicsEvaluator),
                          RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline);
    return text;
}

private string ItalicsEvaluator (Match match)
{
    return string.Format ("<em>{0}</em>", match.Groups[2].Value);
}

private string BoldEvaluator (Match match)
{
    return string.Format ("<strong>{0}</strong>", match.Groups[2].Value);
}
固执像三岁 2024-07-12 14:21:56

单个正则表达式是不行的。 每个文本标记都有它自己的 html 翻译器。 更好地研究现有转换器的实现方式,以了解其工作原理。

http://en.wikipedia.org/wiki/Markdown#See_also

A single regex won't do. Every text markup will have it's own html translator. Better look into how the existing converters are implemented to get an idea on how it works.

http://en.wikipedia.org/wiki/Markdown#See_also

浸婚纱 2024-07-12 14:21:56

我不太了解 C#,但在 Perl 中它是:

\\\*\\\*(.*?)\\\*\\\*/
\< bold\>$1\<\/bold\>/g
\\\*(.\*?)\\\*/
\< em\>$1\<\/em\>/g

I don't know about C# specifically, but in perl it would be:

\\\*\\\*(.*?)\\\*\\\*/
\< bold\>$1\<\/bold\>/g
\\\*(.\*?)\\\*/
\< em\>$1\<\/em\>/g
可遇━不可求 2024-07-12 14:21:56

我遇到了以下建议不要这样做的帖子。 就我而言,虽然我希望保持简单,但我想我会根据jop的建议发布此内容,以防其他人想要去做这个。

I came across the following post that recommends to not do this. In my case though I am looking to keep it simple, but thought I would post this per jop's recommendation in case someone else wanted to do this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文