在 Python 中，正则表达式匹配两个特定行之间的行

发布于 2024-11-27 01:16:23 字数 535 浏览 0 评论 0原文

我正在尝试使用正则表达式从文件中读取的文本中解析出一些行。我知道这可以通过逐行读取文件来完成，但我喜欢在单个正则表达式匹配中捕获所有相关信息位的优雅。

示例文件内容：

---
title: a title
layout: page
---

here's some text
================

this will be blog post content.

我正在尝试生成一个正则表达式匹配，它将返回 2 组：“---”行之间的数据，以及第二个“---”行之后的所有数据。这是我想出的正则表达式字符串，我遇到了一个问题：

re.match('---\n(.*?)\n---\n(.*)', content, re.S)

这似乎工作得很好，除了处理 unix 与 windows 行结尾时。有没有办法允许这个正则表达式匹配 \r（如果它也存在）？它适用于 unix，我相信这只是 \n 。

另外，如果您认为这个正则表达式可以改进，我愿意接受建议。

原文

I am trying to use regex to parse out some lines from text read in from a file. I know this could be done by reading in the file, line-by-line, but I like the elegance in capturing all the relevant bits of info in a single regex match.

The example file contents:

---
title: a title
layout: page
---

here's some text
================

this will be blog post content.

I am trying to produce a regex match that will return 2 groups: the data in-between the "---" lines, and all of the data after the 2nd "---" line. Here is the regex string I have come up with, and I am having an issue with it:

re.match('---\n(.*?)\n---\n(.*)', content, re.S)

This seems to work well, except when dealing with unix vs windows line-endings. Is there a way to allow this regex to match a \r if it's present, too? It works with the unix, which is just \n I believe.

Also, if you think this regex could be improved, I'm open to suggestions.

分享到QQ

分享到微博