正则表达式匹配 double ##，包括任何非 double 的 # 重复

发布于 2025-01-10 22:38:16 字数 702 浏览 0 评论 0原文

如何匹配双哈希“##”之后直到下一个双哈希“##”的所有内容，并包括非“##”的“#”字符的任何重复。例如，下面的示例应该返回两个匹配项，一个用于第 1 章和 1.1 章，第二个用于第 2 章。

## chapter 1

Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Suspendisse mollis magna nec felis gravida, id posuere libero molestie.

### subchapter 1.1

Sed vel ipsum eget tortor maximus ultrices vitae eget dolor.

## chapter 2

Aenean pellentesque lectus quis ex tristique ultrices. Vestibulum eget purus eu ipsum vestibulum pulvinar

目前我发现的最好的是以下正则表达式：

((?!#){2}[\s\S])+

但是，当 ### 或 #### 为找到并算作新章节。

链接到正则表达式示例：https://regex101.com/r/gydtq1/1

原文

How to match everything after a double hash "##" until the next double hash "##" and including any repetition of the "#" character which is not "##".
For instance the below example should return two matches, one for chapter 1 and 1.1 and the second for chapter 2.

## chapter 1

Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Suspendisse mollis magna nec felis gravida, id posuere libero molestie.

### subchapter 1.1

Sed vel ipsum eget tortor maximus ultrices vitae eget dolor.

## chapter 2

Aenean pellentesque lectus quis ex tristique ultrices. Vestibulum eget purus eu ipsum vestibulum pulvinar

At the moment the best I found is the following regex:

((?!#){2}[\s\S])+

which however is confused when a ### or #### is found and is counted as a new chapter.

Link to regex example: https://regex101.com/r/gydtq1/1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

违心° 2025-01-17 22:38:16

您可以使用

re.findall(r'(?ms)^##(?!#).*?(?=\n##(?!#)|\Z)', text)
re.findall(r'^##(?!#).*?(?=\n##(?!#)|\Z)', text, re.M | re.S)

查看正则表达式演示。 详细信息：

(?ms) - re.DOTALL (re.S) 和 re. MULTILINE (re.M) 标志
^ - 行的开头
##(?!#) - 一个 ## 字符串后面没有紧跟 #
.*? - 零个或多个尽可能少的字符
(?=\n##(?!#)|\Z) - 紧跟换行符的位置并且 ## 后面没有紧跟着 # 或字符串结尾。

You can use

re.findall(r'(?ms)^##(?!#).*?(?=\n##(?!#)|\Z)', text)
re.findall(r'^##(?!#).*?(?=\n##(?!#)|\Z)', text, re.M | re.S)

See the regex demo. Details:

(?ms) - a re.DOTALL (re.S) and re.MULTILINE (re.M) flags
^ - start of a line
##(?!#) - a ## string not immediately followed with a #
.*? - zero or more chars as few as possible
(?=\n##(?!#)|\Z) - a location immediately followed with a newline and ## not immediately followed with a # or end of string.

回复收藏 0 原文

天涯离梦残月幽梦 2025-01-17 22:38:16

匹配有时感觉“被高估”，替代方案可能是

re.split(r'(?m)^##(?!#)', text)

Parts 与接受的答案中的部分非常相似：

(?m) - 多行标志（可以作为第四个参数单独传递）
^##(?!#) - 位于行开头的 ## 字符串 (^)，后面不紧跟着后续的 #

警告：结果列表将包含一个条目第一个 ## 之前的所有内容，在本例中是一个空字符串。

Matching feels "overrated" sometimes, an alternative could be

re.split(r'(?m)^##(?!#)', text)

Parts are very similar to the ones in the accepted answer:

(?m) - flag for multiline (could be passed separately as a 4th argument)
^##(?!#) - a ## string at the start of a line (^), not immediately followed with a subsequent #

Caveat: the resulting list will have an entry for everything what precedes the first ##, which is an empty string for the example.

回复收藏 0 原文

~没有更多了~