正则表达式匹配 double ##,包括任何非 double 的 # 重复
如何匹配双哈希“##”之后直到下一个双哈希“##”的所有内容,并包括非“##”的“#”字符的任何重复。 例如,下面的示例应该返回两个匹配项,一个用于第 1 章和 1.1 章,第二个用于第 2 章。
## chapter 1
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Suspendisse mollis magna nec felis gravida, id posuere libero molestie.
### subchapter 1.1
Sed vel ipsum eget tortor maximus ultrices vitae eget dolor.
## chapter 2
Aenean pellentesque lectus quis ex tristique ultrices. Vestibulum eget purus eu ipsum vestibulum pulvinar
目前我发现的最好的是以下正则表达式:
((?!#){2}[\s\S])+
但是,当 ### 或 #### 为找到并算作新章节。
链接到正则表达式示例:https://regex101.com/r/gydtq1/1
How to match everything after a double hash "##" until the next double hash "##" and including any repetition of the "#" character which is not "##".
For instance the below example should return two matches, one for chapter 1 and 1.1 and the second for chapter 2.
## chapter 1
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Suspendisse mollis magna nec felis gravida, id posuere libero molestie.
### subchapter 1.1
Sed vel ipsum eget tortor maximus ultrices vitae eget dolor.
## chapter 2
Aenean pellentesque lectus quis ex tristique ultrices. Vestibulum eget purus eu ipsum vestibulum pulvinar
At the moment the best I found is the following regex:
((?!#){2}[\s\S])+
which however is confused when a ### or #### is found and is counted as a new chapter.
Link to regex example: https://regex101.com/r/gydtq1/1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用
查看正则表达式演示。 详细信息:
(?ms)
-re.DOTALL
(re.S
) 和re. MULTILINE
(re.M
) 标志^
- 行的开头##(?!#)
- 一个##
字符串后面没有紧跟#
.*?
- 零个或多个尽可能少的字符(?=\n##(?!#)|\Z)
- 紧跟换行符的位置并且##
后面没有紧跟着#
或字符串结尾。You can use
See the regex demo. Details:
(?ms)
- are.DOTALL
(re.S
) andre.MULTILINE
(re.M
) flags^
- start of a line##(?!#)
- a##
string not immediately followed with a#
.*?
- zero or more chars as few as possible(?=\n##(?!#)|\Z)
- a location immediately followed with a newline and##
not immediately followed with a#
or end of string.匹配有时感觉“被高估”,替代方案可能是
Parts 与接受的答案中的部分非常相似:
(?m)
- 多行标志(可以作为第四个参数单独传递)^##(?!#)
- 位于行开头的##
字符串 (^
),后面不紧跟着后续的#
警告:结果列表将包含一个条目第一个
##
之前的所有内容,在本例中是一个空字符串。Matching feels "overrated" sometimes, an alternative could be
Parts are very similar to the ones in the accepted answer:
(?m)
- flag for multiline (could be passed separately as a 4th argument)^##(?!#)
- a##
string at the start of a line (^
), not immediately followed with a subsequent#
Caveat: the resulting list will have an entry for everything what precedes the first
##
, which is an empty string for the example.