更换正则python中的精确分组部分

发布于 2025-01-18 01:24:28 字数 707 浏览 0 评论 0 原文

我有一个模板,需要使用 Python 中的正则表达式替换其中的一部分。这是我的模板:(请注意,两个注释之间至少有一个新行)

hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here

我想替换 <; 之间的所有内容。 !--POSTS:END-->Python 中的。所以我制作了 \n([^;]*)\n 模式,但它包括 也是如此。

这就是我想要的:

re.sub('...', 'foo', message)

# expected result:
hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

谢谢。

I have a template that I need to replace a part of that using Regex in Python. Here is my template: (Note that there is at least a new line between two comments)

hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here

I want to replace everything between <!--POSTS:START--> and <!--POSTS:END--> in Python. So I made <!--POSTS:START-->\n([^;]*)\n<!--POSTS:END--> pattern but it includes <!--POSTS:START--> and <!--POSTS:END--> too.

Here is what I want:

re.sub('...', 'foo', message)

# expected result:
hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

探春 2025-01-25 01:24:28

您可以使用捕获组作为开始和结束标记,并在目标替换字符串中将其引用为 \1、\2 等。

如果文本多次出现 ... 则使用 .* 的正则表达式? 将替换每个组。如果'?'删除正则表达式,那么它将删除从第一组开头到最后一组结尾的所有文本。

试试这个:

import re

s = '''
hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here
'''

# for multi-line matching need extra flags in the regexp
s = re.sub(r'(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s, flags=re.DOTALL)

# this inlines the DOTALL flag in the regexp for same result
# s = re.sub(r'(?s)(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s)

print(s)

输出:

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

You can use a capture group for the start and end markers and reference those as \1, \2, etc in the target replacement string.

If the text has multiple occurrences of <!--POSTS:START-->...<!--POSTS:END--> then the regexp with .*? will replace each of those groups. If the '?' is removed the regexp then it will remove all text from the start of the first group to the end of the last group.

Try this:

import re

s = '''
hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here
'''

# for multi-line matching need extra flags in the regexp
s = re.sub(r'(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s, flags=re.DOTALL)

# this inlines the DOTALL flag in the regexp for same result
# s = re.sub(r'(?s)(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'\1foo\2', s)

print(s)

Output:

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here
执妄 2025-01-25 01:24:28

检查此 https://docs.python.org/3/library/library/re.html

import re

pattern = r"(<!--POSTS:START-->\n).*(\n<!--POSTS:END-->)"
string = """hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here"""
result = re.sub(pattern, r"\g<1>foo\g<2>", string)
print(result)

结果:

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

check this https://docs.python.org/3/library/re.html

import re

pattern = r"(<!--POSTS:START-->\n).*(\n<!--POSTS:END-->)"
string = """hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here"""
result = re.sub(pattern, r"\g<1>foo\g<2>", string)
print(result)

result:

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here
注定孤独终老 2025-01-25 01:24:28

您可以使用以下内容:

import re

new_content = re.sub(
    r'(<!--POSTS:START-->\n).*?(?=\n<!--POSTS:END-->)', r"\1foo",
    content, flags=re.DOTALL)

旗帜dotall:制作'。'。特殊角色完全匹配任何角色,包括newline。

我正在使用两件事来完成您想要的

  • group lookahead “?=” :断言,在这里可以匹配给定的子图案,而不会消耗字符
  • 非贪婪的匹配模式(*?)。这将以非贪婪模式匹配。这样,

当我们使用LookAhead, \ n&lt;! - 帖子:end - &gt; 时,我们将不会消耗所有模式,因此我只需要保留第一组并重写内容在比赛之间。这就是为什么我使用 \ 1foo 而不是 \ 1foo \ 2

如果您仅修改第一匹配项,则可以使用 count = 1

re.sub(..., count=1)

您可以在这两行之间有任何东西,它将按预期工作

you can use the following:

import re

new_content = re.sub(
    r'(<!--POSTS:START-->\n).*?(?=\n<!--POSTS:END-->)', r"\1foo",
    content, flags=re.DOTALL)

The flags DOTALL: Make the '.' special character matches any character at all, including a newline.

I'm using two things to do what you want

  • Group lookahead "?=": Asserts that the given subpattern can be matched here, without consuming characters
  • Non greedy match pattern (*?). This will match in a non greedy mode. This way we get all patterns separatly

As we are using lookahead, \n<!--POSTS:END--> will not be consumed so I only need to keep the first group and rewrite the content between the matches. That is why I'm using \1foo and not \1foo\2

If you need to modify only the first match you can use count=1

re.sub(..., count=1)

You can have anything between those two lines and it will work as expected

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文