正则表达式 Markdown 匹配章节之间的内容
我正在尝试匹配所有 h2 章节以及它们之间包含的文本(包括子章节)。
例如:
## chapter 1
text1 text1 text1
[Link text](http://link.com)
text2 text2 text2
### subchapter 1.1
subchapter text
## chapter 2
bla bla bla
* a list 1
* a list 2
## Chapter 3
okokok
上面应该返回 3 个匹配项:第 1 章、第 2 章、第 3 章和第 1 章应该有两组:“第 1 章”和“text1 ... 子章节文本”。
我提供了以下涉及正向和负向前瞻的解决方案:
/(?<=[#]{2}\s)([\w ]+)\n(.*?)(?=[#]{2})/gs
但是它:
- 错过了最后一个章节(因为没有结尾 ##)
- 会错过子章节,因为它匹配三个 # 中的两个
I am trying to match all h2 chapters and the text contained between them (including subchapters).
For instance:
## chapter 1
text1 text1 text1
[Link text](http://link.com)
text2 text2 text2
### subchapter 1.1
subchapter text
## chapter 2
bla bla bla
* a list 1
* a list 2
## Chapter 3
okokok
the above should return 3 matches: chapter 1, chapter 2, chapter 3 and chapter 1 should have two groups: "chapter 1" and "text1 ... subchapter text".
I came with the following solution involving positive and negative lookahead:
/(?<=[#]{2}\s)([\w ]+)\n(.*?)(?=[#]{2})/gs
however it:
- misses last chapter (because there's no ending ##)
- misses subchapter because it matches two of the three #
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想我可以通过以下内容为您提供正确方向的提示:
https://regex101.com/ r/93MI6A/1
基本上通过使用三重 ### 提升对子章节的要求,您可以使用:
匹配所有章节(包括最后一章),一旦您获得了每个章节的匹配项,它应该是可以轻松删除标题并仅获取内容,例如 https://regex101.com/r/ TuwO7R/1:
([a-z0-9]+[\n]{1})([\s\S]+)
I think I can give you a hint in the right direction with the following:
https://regex101.com/r/93MI6A/1
basically by lifting the requirement for subchapters with the triple ### you can use:
which matches all chapters (including the last one), once you have the matches for each single chapter it should be easy to strip out the title and get only the content with something like https://regex101.com/r/TuwO7R/1:
([a-z0-9 ]+[\n]{1})([\s\S]+)
您可以使用:
regex101 示例:
https://regex101.com/r/pFy0I6/1
You can use:
regex101 example:
https://regex101.com/r/pFy0I6/1