正则表达式匹配文本块直到第一个双换行符?

发布于 2024-08-20 20:12:25 字数 295 浏览 4 评论 0原文

我正在制作一个简单的 Textile 解析器,并尝试为“blockquote”编写正则表达式,但在匹配多个新行时遇到困难。示例:

bq. first line of quote
second line of quote
third line of quote

not part of the quote

它将通过 preg_replace() 替换为块引用标记,因此基本上它需要匹配 "bq." 和它遇到的第一个双新行之间的所有内容。我能做到的最好的办法就是获得报价的第一行。谢谢

I'm making a simple Textile parser and am trying to write a regular expression for "blockquote" but am having difficulty matching multiple new lines. Example:

bq. first line of quote
second line of quote
third line of quote

not part of the quote

It will be replaced with blockquote tags via preg_replace() so basically it needs to match everything between "bq." and the first double new line it comes across. The best I can manage is to get the first line of the quote. Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

溺孤伤于心 2024-08-27 20:12:25

试试这个正则表达式:

(?s)bq\.((?!(\r?\n){2}).)*+

含义:

(?s)           # enable dot-all option
b              # match the character 'b'
q              # match the character 'q'
\.             # match the character '.'
(              # start capture group 1
  (?!          #   start negative look ahead
    (          #     start capture group 2
      \r?      #       match the character '\r' and match it once or none at all
      \n       #       match the character '\n'
    ){2}       #     end capture group 2 and repeat it exactly 2 times
  )            #   end negative look ahead
  .            #   match any character
)*+            # end capture group 1 and repeat it zero or more times, possessively

\r?\n 匹配 Windows、*nix 和(较新的)MacOS 换行符。如果您需要考虑真正的旧 Mac 计算机,请向其中添加单个 \r\r?\n|\r

Try this regex:

(?s)bq\.((?!(\r?\n){2}).)*+

meaning:

(?s)           # enable dot-all option
b              # match the character 'b'
q              # match the character 'q'
\.             # match the character '.'
(              # start capture group 1
  (?!          #   start negative look ahead
    (          #     start capture group 2
      \r?      #       match the character '\r' and match it once or none at all
      \n       #       match the character '\n'
    ){2}       #     end capture group 2 and repeat it exactly 2 times
  )            #   end negative look ahead
  .            #   match any character
)*+            # end capture group 1 and repeat it zero or more times, possessively

The \r?\n matches a Windows, *nix and (newer) MacOS line breaks. If you need to account for real old Mac computers, add the single \r to it: \r?\n|\r

感情洁癖 2024-08-27 20:12:25

这个接受的答案只为我捕获了该块的最后一个字符。我最终使用了这个:

$text =~ /(?s)bq\.(.+?)\n\n/g

This accepted answer only captured the last character of the block for me. I ended up using this:

$text =~ /(?s)bq\.(.+?)\n\n/g
枕梦 2024-08-27 20:12:25

这行得通吗?

'/(.+)\n\n/s'

我相信 's' 代表单行。

Would this work?

'/(.+)\n\n/s'

I believe 's' stands for single line.

旧竹 2024-08-27 20:12:25

编辑:呃,误读了这个问题..“bq”。意义重大。

echo preg_replace('/^bq\.(.+?)\n\n/s', '<blockquote>$1</blockquote>', $str, 1);

有时通过网络表单输入的数据包含 \r\n 而不是仅 \n 这会使其

echo preg_replace('/^bq\.(.+?)\r\n\r\n/s', '<blockquote>$1</blockquote>', $str, 1);

问号使其在找到第一个双返回后添加结束块引号(“非贪婪”我相信它被称为),所以任何其他双倍回报单独保留(如果这不是你想要的,显然将其取出)。

Edit: Ehr, misread the question.. "bq." was significant.

echo preg_replace('/^bq\.(.+?)\n\n/s', '<blockquote>$1</blockquote>', $str, 1);

Sometimes data that is entered via webforms contains \r\n instead of just \n which would make it

echo preg_replace('/^bq\.(.+?)\r\n\r\n/s', '<blockquote>$1</blockquote>', $str, 1);

The questionmark makes it add the closing blockquotes after the first double return found ("non-greedy" I believe it's called), so any other double returns are left alone (if that is not what you want, take it out obviously).

音盲 2024-08-27 20:12:25

我的直觉告诉我类似的事情...

preg_match("/^bq\. (.+?)\n\n/s", $input, $matches)

就像上面的人说的那样,正则表达式末尾的 / 后面的 s 标志意味着 .< /code> 将匹配新行字符。通常,如果没有这个,正则表达式就是一种单行的东西。

那么.+后面的问号?表示非贪婪匹配,这样.+就不会尽可能匹配;相反,它将匹配尽可能小的值,以便 \n\n 将匹配第一个可用的双行。

您计划在多大程度上支持 Textile 的功能?因为您的正则表达式可能会变得非常复杂,因为 Textile 允许诸如...

bq.. This is a block quote

This is still a block quote

或...之

bq(funky). This is a block quote belonging to the class funky!

bq{color:red;}. Block quote with red text!

类的事情,我认为您的正则表达式替换技术将无法处理所有这些事情。

My instincts tell me something like...

preg_match("/^bq\. (.+?)\n\n/s", $input, $matches)

Just like the above fella says, the s flag after the / at the end of the RegEx means that the . will match new line characters. Usually, without this, RegExs are kind of a one line thing.

Then the question mark ? after the .+ denotes a non-greedy match so that the .+ won't match as it can; instead it will match the minimum possible, so that the \n\n will match the first available double line.

To what extent are you planning on supporting features of Textile? Because your RegEx can get pretty complicated, as Textile allows things like...

bq.. This is a block quote

This is still a block quote

or...

bq(funky). This is a block quote belonging to the class funky!

bq{color:red;}. Block quote with red text!

All of which your regex-replace technique won't be able to handle, methinks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文