在 Python 中,正则表达式匹配两个特定行之间的行
我正在尝试使用正则表达式从文件中读取的文本中解析出一些行。我知道这可以通过逐行读取文件来完成,但我喜欢在单个正则表达式匹配中捕获所有相关信息位的优雅。
示例文件内容:
---
title: a title
layout: page
---
here's some text
================
this will be blog post content.
我正在尝试生成一个正则表达式匹配,它将返回 2 组:“---”行之间的数据,以及第二个“---”行之后的所有数据。这是我想出的正则表达式字符串,我遇到了一个问题:
re.match('---\n(.*?)\n---\n(.*)', content, re.S)
这似乎工作得很好,除了处理 unix 与 windows 行结尾时。有没有办法允许这个正则表达式匹配 \r(如果它也存在)?它适用于 unix,我相信这只是 \n
。
另外,如果您认为这个正则表达式可以改进,我愿意接受建议。
I am trying to use regex to parse out some lines from text read in from a file. I know this could be done by reading in the file, line-by-line, but I like the elegance in capturing all the relevant bits of info in a single regex match.
The example file contents:
---
title: a title
layout: page
---
here's some text
================
this will be blog post content.
I am trying to produce a regex match that will return 2 groups: the data in-between the "---" lines, and all of the data after the 2nd "---" line. Here is the regex string I have come up with, and I am having an issue with it:
re.match('---\n(.*?)\n---\n(.*)', content, re.S)
This seems to work well, except when dealing with unix vs windows line-endings. Is there a way to allow this regex to match a \r if it's present, too? It works with the unix, which is just \n
I believe.
Also, if you think this regex could be improved, I'm open to suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
行尾标记被视为空格,因此您可以使用构造
\s+
来匹配与平台无关的行尾(和其他空格)。The end of line markers are considered whitespace so you can use the construct
\s+
to match the end of line (and other whitespace) that is platform independent.序列
(\r\n|\r|\n)
将匹配所有“正常”行结尾(分别为 Windows、旧 Mac 和 *nix)。The sequence
(\r\n|\r|\n)
will match all 'normal' line endings (Windows, old Mac, and *nix, respectively).