使用正则表达式搜索和捕获字符 Python
在解决 Python Challenge 中的问题之一时,我尝试按如下方式解决它:
读取中的输入一个包含以下字符的文本文件:
DQheAbsaMLjTmAOKmNsLziVMenFxQdATQIjItwtyCHyeMwQTNxbbLXWZnGmDqHhXnLHfEyvzxMhSXzd
BEBaxeaPgQPttvqRvxHPEOUtIsttPDeeuGFgmDkKQcEYjuSuiGROGfYpzkQgvcCDBKrcYwHFlvPzDMEk
MyuPxvGtgSvWgrybKOnbEGhqHUXHhnyjFwSfTfaiWtAOMBZEScsOSumwPssjCPlLbLsPIGffDLpZzMKz
jarrjufhgxdrzywWosrblPRasvRUpZLaUbtDHGZQtvZOvHeVSTBHpitDllUljVvWrwvhpnVzeWVYhMPs
kMVcdeHzFZxTWocGvaKhhcnozRSbWsIEhpeNfJaRjLwWCvKfTLhuVsJczIYFPCyrOJxOPkXhVuCqCUgE
luwLBCmqPwDvUPuBRrJZhfEXHXSBvljqJVVfEGRUWRSHPeKUJCpMpIsrV.......
我需要的是浏览此文本文件并选择每侧仅由三个大写字母包围的所有小写字母。
我为执行上述操作而编写的 python 脚本如下:
import re
pattern = re.compile("[a-z][A-Z]{3}([a-z])[A-Z]{3}[a-z]")
f = open('/Users/Dev/Sometext.txt','r')
for line in f:
result = pattern.search(line)
if result:
print result.groups()
f.close()
上面给出的脚本不是返回捕获(小写字符列表),而是返回满足正则表达式条件的所有文本块,例如
aXCSdFGHj
vCDFeTYHa
nHJUiKJHo
.........
.........
有人可以告诉我什么吗我到底在这里做错了吗?是否有另一种方法可以对整个文件运行正则表达式搜索,而不是循环遍历整个文件?
谢谢
While going through one of the problems in Python Challenge, I am trying to solve it as follows:
Read the input in a text file with characters as follows:
DQheAbsaMLjTmAOKmNsLziVMenFxQdATQIjItwtyCHyeMwQTNxbbLXWZnGmDqHhXnLHfEyvzxMhSXzd
BEBaxeaPgQPttvqRvxHPEOUtIsttPDeeuGFgmDkKQcEYjuSuiGROGfYpzkQgvcCDBKrcYwHFlvPzDMEk
MyuPxvGtgSvWgrybKOnbEGhqHUXHhnyjFwSfTfaiWtAOMBZEScsOSumwPssjCPlLbLsPIGffDLpZzMKz
jarrjufhgxdrzywWosrblPRasvRUpZLaUbtDHGZQtvZOvHeVSTBHpitDllUljVvWrwvhpnVzeWVYhMPs
kMVcdeHzFZxTWocGvaKhhcnozRSbWsIEhpeNfJaRjLwWCvKfTLhuVsJczIYFPCyrOJxOPkXhVuCqCUgE
luwLBCmqPwDvUPuBRrJZhfEXHXSBvljqJVVfEGRUWRSHPeKUJCpMpIsrV.......
What I need is to go through this text file and pick all lower case letters that are enclosed by only three upper-case letters on each side.
The python script that I wrote to do the above is as follows:
import re
pattern = re.compile("[a-z][A-Z]{3}([a-z])[A-Z]{3}[a-z]")
f = open('/Users/Dev/Sometext.txt','r')
for line in f:
result = pattern.search(line)
if result:
print result.groups()
f.close()
The above given script, instead of returning the capture(list of lower case characters), returns all the text blocks that meets the regular expression criteria, like
aXCSdFGHj
vCDFeTYHa
nHJUiKJHo
.........
.........
Can somebody tell me what exactly I am doing wrong here? And instead of looping through the entire file, is there an alternate way to run the regular expression search on the entire file?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
将
result.groups()
更改为result.group(1)
,您将仅获得单个字母匹配。您的代码的第二个问题是它不会在一行上找到多个结果。因此,您不需要使用
re.search
,而是需要re.findall
或re.finditer
。findall
将返回字符串或字符串元组,而finditer
返回匹配对象。这是我解决同样问题的地方:
请注意,re.findall 和 re.finditer 返回不重叠的结果。因此,当使用上述模式与
re.findall
搜索字符串'aBBBcDDDeFFFg'
时,唯一的匹配将是'c'
,而不是 <代码>'e'。幸运的是,这个 Python Challenge 问题不包含这样的例子。Change
result.groups()
toresult.group(1)
and you will get just the single letter match.A second problem with your code is that it will not find multiple results on one line. So instead of using
re.search
you'll needre.findall
orre.finditer
.findall
will return strings or tuples of strings, whereasfinditer
returns match objects.Here's where I approached the same problem:
Note that
re.findall
andre.finditer
return non-overlapping results. So when using the above pattern withre.findall
searching against string'aBBBcDDDeFFFg'
, your only match will be'c'
, but not'e'
. Fortunately, this Python Challenge problem contains no such such examples.我建议使用lookaround:
这不会有重叠匹配的问题。
说明:
I'd suggest using lookaround:
This will have no problem with overlapping matches.
Explanation:
findall
的作用:也许是 re 模块中最有用的函数。
read() 函数将整个文件读入大字符串中。如果您需要将正则表达式与整个文件进行匹配,这尤其有用。
警告:根据文件的大小,您可能更喜欢像第一种方法那样逐行迭代文件。
What
findall
does:Maybe the most useful function in the
re
module.The read() function reads the whole file into on big string. This is especially useful if you need to match a regular expression against the whole file.
Warning: Depending on the size of the file, you may prefer iterating over the file line by line as you did in your first approach.