我可以使用 re.sub (或 regexobject.sub)来替换子组中的文本吗?

发布于 2024-07-21 06:03:54 字数 909 浏览 9 评论 0原文

我需要解析一个如下所示的配置文件(简化的):

<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>

我的目标是能够更改特定于特定链接的参数,但我无法使替换正常工作。 我有一个正则表达式,可以隔离特定链接上的参数值,其中该值包含在捕获组 1 中:

link_id = r'id="1"'
parameter = 'mode'
link_regex = '<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>([\w\W]*)</%s>[\w\W]*</link>' \
% (link_id, parameter, parameter)

因此,

print re.search(final_regex, f_read).group(1)

打印 regex howto 中的示例

似乎都假设人们想要使用捕获组在替换中,但我需要做的是替换捕获组本身(例如将 Link1 模式从 ipsec 更改为 udp)。

I need to parse a configuration file which looks like this (simplified):

<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>

My goal is to be able to change parameters specific to a particular link, but I'm having trouble getting substitution to work correctly. I have a regex that can isolate a parameter value on a specific link, where the value is contained in capture group 1:

link_id = r'id="1"'
parameter = 'mode'
link_regex = '<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>([\w\W]*)</%s>[\w\W]*</link>' \
% (link_id, parameter, parameter)

Thus,

print re.search(final_regex, f_read).group(1)

prints
ipsec

The examples in the regex howto all seem to assume that one wants to use the capture group in the replacement, but what I need to do is replace the capture group itself (e.g. change the Link1 mode from ipsec to udp).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

﹏半生如梦愿梦如真 2024-07-28 06:03:54

我必须强制要求您:“不要使用正则表达式来执行此操作。”

看看使用 BeautifulSoup,例如:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> html = """
... <config>
... <links>
... <link name="Link1" id="1">
...  <encapsulation>
...   <mode>ipsec</mode>
...  </encapsulation>
... </link>
... <link name="Link2" id="2">
...  <encapsulation>
...   <mode>udp</mode>
...  </encapsulation>
... </link>
... </links>
... </config>
... """
>>> soup = BeautifulStoneSoup(html)
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>ipsec</mode>
</encapsulation>
</link>
>>> soup.find('link', id=1).mode.contents[0].replaceWith('whatever')
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>whatever</mode>
</encapsulation>
</link>

查看你的正则表达式,我真的无法判断这是否正是你想要做的,但无论你想要做什么,使用像 BeautifulSoup 这样的库比尝试修补要好得多一起使用正则表达式。 如果可能的话,我强烈建议走这条路。

I have to give you the obligatory: "don't use regular expressions to do this."

Check out how very easily awesome it is to do this with BeautifulSoup, for example:

>>> from BeautifulSoup import BeautifulStoneSoup
>>> html = """
... <config>
... <links>
... <link name="Link1" id="1">
...  <encapsulation>
...   <mode>ipsec</mode>
...  </encapsulation>
... </link>
... <link name="Link2" id="2">
...  <encapsulation>
...   <mode>udp</mode>
...  </encapsulation>
... </link>
... </links>
... </config>
... """
>>> soup = BeautifulStoneSoup(html)
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>ipsec</mode>
</encapsulation>
</link>
>>> soup.find('link', id=1).mode.contents[0].replaceWith('whatever')
>>> soup.find('link', id=1)
<link name="Link1" id="1">
<encapsulation>
<mode>whatever</mode>
</encapsulation>
</link>

Looking at your regular expression I can't really tell if this is exactly what you wanted to do, but whatever it is you want to do, using a library like BeautifulSoup is much, much, better than trying to patch a regular expression together. I highly recommend going this route if possible.

北渚 2024-07-28 06:03:54

这看起来像有效的 XML,在这种情况下,您不需要 BeautifulSoup,绝对不需要正则表达式,只需使用任何好的 XML 库加载 XML,编辑它并打印出来,这是使用 ElementTree 的方法:

import xml.etree.cElementTree as ET

s = """<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>
"""
configElement = ET.fromstring(s)

for modeElement in configElement.findall("*/*/*/mode"):
    modeElement.text = "udp"

print ET.tostring(configElement)

它将更改所有模式元素到udp,这是输出:

<config>
<links>
<link id="1" name="Link1">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
<link id="2" name="Link2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>

This looks like valid XML, in that case you don't need BeautifulSoup, definitely not the regex, just load XML using any good XML library, edit it and print it out, here is a approach using ElementTree:

import xml.etree.cElementTree as ET

s = """<config>
<links>
<link name="Link1" id="1">
 <encapsulation>
  <mode>ipsec</mode>
 </encapsulation>
</link>
<link name="Link2" id="2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>
"""
configElement = ET.fromstring(s)

for modeElement in configElement.findall("*/*/*/mode"):
    modeElement.text = "udp"

print ET.tostring(configElement)

It will change all mode elements to udp, this is the output:

<config>
<links>
<link id="1" name="Link1">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
<link id="2" name="Link2">
 <encapsulation>
  <mode>udp</mode>
 </encapsulation>
</link>
</links>
</config>
剪不断理还乱 2024-07-28 06:03:54

假设你的 link_regex 是正确的,你可以像这样添加括号:

(<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>)([\w\W]*)(</%s>[\w\W]*</link>)

然后你可以这样做:

p = re.compile(link_regex)
replacement = 'foo'
print p.sub(r'\g<1>' + replacement + r'\g<3>' , f_read)

Supposing that your link_regex is correct, you can add parenthesis like this:

(<link [\w\W]+ %s>[\w\W]*[\w\W]*<%s>)([\w\W]*)(</%s>[\w\W]*</link>)

and then you could do:

p = re.compile(link_regex)
replacement = 'foo'
print p.sub(r'\g<1>' + replacement + r'\g<3>' , f_read)
浪菊怪哟 2024-07-28 06:03:54

不确定我会这样做,但最快的方法是转移捕获:

([\w\W][\w\W]<%s>)[\w \W]([\w\W])' 并替换为 group1 +mode+group2

not sure i'd do it that way, but the quickest way would be to shift the captures:

([\w\W][\w\W]<%s>)[\w\W]([\w\W])' and replace with group1 +mode+group2

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文