如何在 Ruby 中修复这个多行正则表达式?

发布于 2024-11-02 07:41:22 字数 868 浏览 3 评论 0原文

我的 Ruby 正则表达式在多行模式下无法正常工作。

我正在尝试将 Markdown 文本转换为 Redmine 中使用的 Textile-eque 标记。问题出在我用于转换代码块的正则表达式中。它应该找到任何以 4 个空格或制表符开头的行,然后将它们包装在 pre 标记中。

markdownText = '# header

some text that precedes code

    var foo = 9;
    var fn = function() {}

    fn();

some post text'

puts markdownText.gsub!(/(^(?:\s{4}|\t).*?$)+/m,"<pre>\n\\1\n</pre>")

预期结果:

# header

some text that precedes code

<pre>
    var foo = 9;
    var fn = function() {}

    fn();
</pre>

some post text

问题是结束 pre 标记打印在文档末尾,而不是“fn();”之后。我尝试了以下表达式的一些变体,但它不匹配:

gsub!(/(^(?:\s{4}|\t).*?$)+^(\S)/m, "<pre>\n\\1\n</pre>\\2")

如何让正则表达式仅匹配缩进的代码块?您可以在 此处 在 Rubular 上测试此正则表达式。

I have a regular expression in Ruby that isn't working properly in multiline mode.

I'm trying to convert Markdown text into the Textile-eque markup used in Redmine. The problem is in my regular expression for converting code blocks. It should find any lines leading with 4 spaces or a tab, then wrap them in pre tags.

markdownText = '# header

some text that precedes code

    var foo = 9;
    var fn = function() {}

    fn();

some post text'

puts markdownText.gsub!(/(^(?:\s{4}|\t).*?$)+/m,"<pre>\n\\1\n</pre>")

Intended result:

# header

some text that precedes code

<pre>
    var foo = 9;
    var fn = function() {}

    fn();
</pre>

some post text

The problem is that the closing pre tag is printed at the end of the document instead of after "fn();". I tried some variations of the following expression but it doesn't match:

gsub!(/(^(?:\s{4}|\t).*?$)+^(\S)/m, "<pre>\n\\1\n</pre>\\2")

How do I get the regular expression to match just the indented code block? You can test this regular expression on Rubular here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

凉薄对峙 2024-11-09 07:41:22

首先,请注意,Ruby 中的'm' 多行模式与其他语言的's' 单行模式等效。换句话说; Ruby 中的 'm' 模式表示:“点匹配全部”

这个正则表达式可以很好地匹配类似 markdown 的代码部分:

re = / # Match a MARKDOWN CODE section.
    (\r?\n)              # $1: CODE must be preceded by blank line
    (                    # $2: CODE contents
      (?:                # Group for multiple lines of code.
        (?:\r?\n)+       # Each line preceded by a newline,
        (?:[ ]{4}|\t).*  # and begins with four spaces or tab.
      )+                 # One or more CODE lines
      \r?\n              # CODE folowed by blank line.
    )                    # End $2: CODE contents
    (?=\r?\n)            # CODE folowed by blank line.
    /x
result = subject.gsub(re, '\1<pre>\2</pre>')

这需要在代码部分之前和之后有一个空行,并允许在代码部分本身内有空行。它允许 \r\n\n 行终止。请注意,这不会去除每行前的前 4 个空格(或制表符)。这样做将需要更多的代码复杂性。 (我不是一个红宝石爱好者,所以无法帮助解决这个问题。)

我建议查看 markdown 源代码本身,看看它是如何真正完成的。

First, note that 'm' multi-line mode in Ruby is equivalent to 's' single-line mode of other languages. In other words; 'm' mode in Ruby means: "dot matches all".

This regex will do a pretty good job of matching a markdown-like code section:

re = / # Match a MARKDOWN CODE section.
    (\r?\n)              # $1: CODE must be preceded by blank line
    (                    # $2: CODE contents
      (?:                # Group for multiple lines of code.
        (?:\r?\n)+       # Each line preceded by a newline,
        (?:[ ]{4}|\t).*  # and begins with four spaces or tab.
      )+                 # One or more CODE lines
      \r?\n              # CODE folowed by blank line.
    )                    # End $2: CODE contents
    (?=\r?\n)            # CODE folowed by blank line.
    /x
result = subject.gsub(re, '\1<pre>\2</pre>')

This requires a blank line before and after the code section and allows blank lines within the code section itself. It allows for either \r\n or \n line terminations. Note that this does not strip the leading 4 spaces (or tab) before each line. Doing that will require more code complexity. (I am not a ruby guy so can't help out with that.)

I would recommend looking at the markdown source itself to see how its really being done.

空‖城人不在 2024-11-09 07:41:22

/^(\s{4}|\t)+.+\;\n$/m

效果好一点,但仍然会拾取我们不想要的换行符。
这里它是在 rubular 上的。

/^(\s{4}|\t)+.+\;\n$/m

works a little better, still picks up a newline that we don't want.
here it is on rubular.

优雅的叶子 2024-11-09 07:41:22

这对我来说适用于您的示例输入。

markdownText.gsub(/\n?((\s{4}.+)+)/, "\n<pre>#{$1}\n</pre>")

This is working for me with your sample input.

markdownText.gsub(/\n?((\s{4}.+)+)/, "\n<pre>#{$1}\n</pre>")
你怎么这么可爱啊 2024-11-09 07:41:22

Here's another one that captures all the indented lines in a single block

((?:^(?: {4}|\t)[^\n]*$\n?)+)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文