匹配注释和多行注释正则表达式

发布于 2025-01-15 07:12:55 字数 941 浏览 1 评论 0原文

我有这个文本,我需要一个匹配所有注释(多行和非)的正则表达式模式:

# English language file
# all entries must contain a string number, followed by a space, followed by a string, ended by a pound character
# lines beginning with pound character are a comment
# blank lines are ignored

1 null#

# generic names
1900 text1234#
1901 text1234#
1902 text1234#

我想到了这一点:

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))*

但它没有正确分组多行注释: https://regex101.com/r/LC1f5c/1


另一方面,如果我重复第二组 3 次:

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))

按照我想要的方式工作(仅限多行注释): https://regex101.com/r/ FMSP13/1

I've this text and I need a regex pattern that matches all comments (multi-line and non):

# English language file
# all entries must contain a string number, followed by a space, followed by a string, ended by a pound character
# lines beginning with pound character are a comment
# blank lines are ignored

1 null#

# generic names
1900 text1234#
1901 text1234#
1902 text1234#

I thought about this:

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))*

But it does not group the multi-lines comment correctly: https://regex101.com/r/LC1f5c/1


If, on the other hand, I repeat the second group 3 times:

(?:^#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))(?:\n#\s?([^\n]+))

Works the way I want (with multi-line comments only): https://regex101.com/r/FMSP13/1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

怂人 2025-01-22 07:12:55

这是一种方法:

import re

comments = []
for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE):
    comments.append(comment[2:-1].split("\n# "))

使用列表理解进行相同的操作:

comments = [comment[2:-1].split("\n# ") for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE)]

输出:

[
    [
        'English language file',
        'all entries must contain a string number, followed by a space, followed by a string, ended by a pound character',
        'lines beginning with pound character are a comment', 
        'blank lines are ignored'
    ],
    [
        'generic names'
    ]
]

  • comment[2:-1] 允许不保留前两个字符 (#),以及最后一个字符(\n)。

(?:^# .*?\n)+
  • (?:)+:非捕获组,一次到无限次,尽可能多。
    • ^:行的开头。
    • #:匹配 #
    • .*?:匹配任何字符,次数为零到无限次,尽可能少。
    • \n:匹配换行符。

Here is one way to do so:

import re

comments = []
for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE):
    comments.append(comment[2:-1].split("\n# "))

The same using a list comprehension:

comments = [comment[2:-1].split("\n# ") for comment in re.findall(r"(?:^# .*?\n)+", data, flags=re.MULTILINE)]

Output:

[
    [
        'English language file',
        'all entries must contain a string number, followed by a space, followed by a string, ended by a pound character',
        'lines beginning with pound character are a comment', 
        'blank lines are ignored'
    ],
    [
        'generic names'
    ]
]

  • comment[2:-1] allows not to keep the first two characters (#), as well as the last character (\n).

(?:^# .*?\n)+
  • (?:)+: Non capturing group, between one and unlimited times, as much as possible.
    • ^: Start of the line.
    • #: Matches #.
    • .*?: Matches any character, between zero and unlimited times, as few as possible.
    • \n: Matches a newline.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文