当前位置：文江博客话题详情

Python regex match python-3.x

匹配注释和多行注释正则表达式

发布于 2025-01-15 07:12:55 字数 941 浏览 1 评论 0原文

我有这个文本，我需要一个匹配所有注释（多行和非）的正则表达式模式：

# English language file
# all entries must contain a string number, followed by a space, followed by a string, ended by a pound character
# lines beginning with pound character are a comment
# blank lines are ignored

1 null#

# generic names
1900 text1234#
1901 text1234#
1902 text1234#

我想到了这一点：

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))*

但它没有正确分组多行注释： https://regex101.com/r/LC1f5c/1

另一方面，如果我重复第二组 3 次：

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))

按照我想要的方式工作（仅限多行注释）： https://regex101.com/r/ FMSP13/1

I've this text and I need a regex pattern that matches all comments (multi-line and non):

# English language file
# all entries must contain a string number, followed by a space, followed by a string, ended by a pound character
# lines beginning with pound character are a comment
# blank lines are ignored

1 null#

# generic names
1900 text1234#
1901 text1234#
1902 text1234#

I thought about this:

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))*

But it does not group the multi-lines comment correctly: https://regex101.com/r/LC1f5c/1

If, on the other hand, I repeat the second group 3 times:

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))

Works the way I want (with multi-line comments only): https://regex101.com/r/FMSP13/1

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

怂人 2025-01-22 07:12:55

这是一种方法：

import re

comments = []
for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE):
    comments.append(comment[2:-1].split("\n# "))

使用列表理解进行相同的操作：

comments = [comment[2:-1].split("\n# ") for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE)]

输出：

[
    [
        'English language file',
        'all entries must contain a string number, followed by a space, followed by a string, ended by a pound character',
        'lines beginning with pound character are a comment', 
        'blank lines are ignored'
    ],
    [
        'generic names'
    ]
]

comment[2:-1] 允许不保留前两个字符 (#），以及最后一个字符（\n）。

(?:^# .*?\n)+

(?:)+：非捕获组，一次到无限次，尽可能多。
- ^：行的开头。
- #：匹配 #。
- .*?：匹配任何字符，次数为零到无限次，尽可能少。
- \n：匹配换行符。

Here is one way to do so:

import re

comments = []
for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE):
    comments.append(comment[2:-1].split("\n# "))

The same using a list comprehension:

comments = [comment[2:-1].split("\n# ") for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE)]

Output:

[
    [
        'English language file',
        'all entries must contain a string number, followed by a space, followed by a string, ended by a pound character',
        'lines beginning with pound character are a comment', 
        'blank lines are ignored'
    ],
    [
        'generic names'
    ]
]

comment[2:-1] allows not to keep the first two characters (#), as well as the last character (\n).

(?:^# .*?\n)+

(?:)+: Non capturing group, between one and unlimited times, as much as possible.
- ^: Start of the line.
- #: Matches #.
- .*?: Matches any character, between zero and unlimited times, as few as possible.
- \n: Matches a newline.

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

26 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

十二

文章 0 评论 0

飞烟轻若梦

文章 0 评论 0

OPleyuhuo

文章 0 评论 0

wxb0109

文章 0 评论 0

旧城空念

文章 0 评论 0

-小熊_

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文