使用正则表达式将Python中的大写重复字母替换为单个小写字母

发布于 2024-10-01 17:51:55 字数 242 浏览 3 评论 0原文

我试图将字符串中重复两次的大写字母的任何实例替换为该字母的单个小写实例。我正在使用以下正则表达式,它能够匹配重复的大写字母,但我不确定如何将被替换的字母变成小写。

import re
s = 'start TT end'
re.sub(r'([A-Z]){2}', r"\1", s)
>>> 'start T end'

如何将“\1”变成小写?我不应该使用正则表达式来执行此操作吗?

I am trying to replace any instances of uppercase letters that repeat themselves twice in a string with a single instance of that letter in a lower case. I am using the following regular expression and it is able to match the repeated upper case letters, but I am unsure as how to make the letter that is being replaced lower case.

import re
s = 'start TT end'
re.sub(r'([A-Z]){2}', r"\1", s)
>>> 'start T end'

How can I make the "\1" lower case? Should I not be using a regular expression to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

一萌ing 2024-10-08 17:51:55

传递一个函数作为repl参数。 MatchObject 传递给此函数并.group(1) 给出第一个带括号的子组:

import re
s = 'start TT end'
callback = lambda pat: pat.group(1).lower()
re.sub(r'([A-Z]){2}', callback, s)

编辑
是的,您应该使用 ([AZ])\1 而不是 ([AZ]){2} 以便匹配例如AZ。 (参见@bobince的

import re
s = 'start TT end'
re.sub(r'([A-Z])\1', lambda pat: pat.group(1).lower(), s) # Inline

给出:

'start t end'

Pass a function as the repl argument. The MatchObject is passed to this function and .group(1) gives the first parenthesized subgroup:

import re
s = 'start TT end'
callback = lambda pat: pat.group(1).lower()
re.sub(r'([A-Z]){2}', callback, s)

EDIT
And yes, you should use ([A-Z])\1 instead of ([A-Z]){2} in order to not match e.g. AZ. (See @bobince's answer.)

import re
s = 'start TT end'
re.sub(r'([A-Z])\1', lambda pat: pat.group(1).lower(), s) # Inline

Gives:

'start t end'
半枫 2024-10-08 17:51:55

您无法更改替换字符串中的大小写。您需要一个替换功能:

>>> def replacement(match):
...     return match.group(1).lower()
... 
>>> re.sub(r'([A-Z])\1', replacement, 'start TT end')
'start t end'

You can't change case in a replacement string. You would need a replacement function:

>>> def replacement(match):
...     return match.group(1).lower()
... 
>>> re.sub(r'([A-Z])\1', replacement, 'start TT end')
'start t end'
再见回来 2024-10-08 17:51:55
def replace(s):
    return " ".join(re.findall(r"[A-Z]){2}", s)).lower()

我想这就是您正在寻找的。

def replace(s):
    return " ".join(re.findall(r"[A-Z]){2}", s)).lower()

I guess this is what you are looking for.

亽野灬性zι浪 2024-10-08 17:51:55

您可以使用正则表达式来完成此操作,只需传递一个函数作为替换,例如 文档说。问题是你的模式。

事实上,您的模式与任意两个大写字母的运行相匹配。我将把实际的模式留给您,但它以 AA|BB|CC| 开头。

You can do it with a regular expression, just pass a function as the replacement like the docs say. The problem is your pattern.

As it is, your pattern matches runs of any two capital letters. I'll leave the actual pattern to you, but it starts with AA|BB|CC|.

千年*琉璃梦 2024-10-08 17:51:55

标识替换的“repl”参数可以是字符串(如此处所示)或函数。这将做你想做的事:

import re

def toLowercase(matchobj):
   return matchobj.group(1).lower()

s = 'start TT end'
re.sub(r'([A-Z]){2}', toLowercase, s)
>>> 'start t end'

The 'repl' parameter that identifies the replacement can be either a string (as you have it here) or a function. This will do what you wish:

import re

def toLowercase(matchobj):
   return matchobj.group(1).lower()

s = 'start TT end'
re.sub(r'([A-Z]){2}', toLowercase, s)
>>> 'start t end'
落花浅忆 2024-10-08 17:51:55

试试这个:

def tol(m):
   return m.group(0)[0].lower()

s = 'start TTT AAA end'
re.sub(r'([A-Z]){2,}', tol, s)

请注意,这不会替换单个大写字母。如果您想这样做,请使用 r'([AZ]){1,}'

Try this:

def tol(m):
   return m.group(0)[0].lower()

s = 'start TTT AAA end'
re.sub(r'([A-Z]){2,}', tol, s)

Note that this doesn't replace singe upper letters. If you want to do it, use r'([A-Z]){1,}'.

墨小墨 2024-10-08 17:51:55

警告!这篇文章没有按要求回复。继续承担你自己的责任!

我不知道极端情况怎么可能发生,但这就是普通 Python 进行我的幼稚编码的方式。

import string
s = 'start TT end AAA BBBBBBB'
for c in string.uppercase:
    s = s.replace(c+c,c.lower())
print s
""" Output:
start t end aA bbbB
"""

WARNING! This post has no re as requested. Continue with your own responsibility!

I do not know how possible are corner cases but this is how normal Python does my naive coding.

import string
s = 'start TT end AAA BBBBBBB'
for c in string.uppercase:
    s = s.replace(c+c,c.lower())
print s
""" Output:
start t end aA bbbB
"""
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文