拆分字符串并捕获Python Regex中的所有实例
Newbie在这里,我一直在尝试学习Regex一段时间,但有时我觉得我不明白Regex如何处理字符串。因为在计划阶段,我似乎可以解决问题,但是在实施中,它并不能正如我期望的那样起作用。
这是我的小问题:我的字符串包含一个或多个名称(团队名称)。问题是,如果字符串包含多个,则没有分离器。所有名称都是直接的。
Some examples :
------------String -----------------Contains----------Names to be extracted
- 'RangersIslandersDevils ' - > 3个名称 - >>> [游骑兵,岛民,魔鬼]
- “ 49架” ------------------> 2个名称 - >>> [49ers,Raiders]
- “雪崩” ---------------------> 1个名称 - >>> [雪崩]
- “红翅膀” --------------------> 1个名称 - >>> [Red Wings]
我想捕获每个字符串中的每个名称,然后以后在循环中使用它们。但是我似乎无法实现我想象的模式。
我脑海中的模式实现就是这样:
- 开始扫描文本,预计将以资本开头 字母或号码
- 如果您看到字面的“ s”,然后是大写字母(例如... s [az] ..)捕获文本,直到“ s”(包括s)
- 重复第二步,直到您再也没有看到(.. .. s [az] ..)模式。并将其余的字符串作为姓氏。
- 可选地,在列表中写下所有名称,
我尝试过一些代码,其中第二步仅捕获一个实例,而步骤3通常会给出另一个实例。
re.findall('([A-Z0-9].*s)*([A-Z].*)+', 'RangersIslandersMolsDevil')
只返回两个名称:
[('Rangersislandersmols','devil')
而我想要四:
[游骑兵,岛民,摩尔,魔鬼]
Newbie here, I have been trying to learn regex for some time but sometimes I feel I can't understand how regex is handling strings. Because in planning phase I seem to work it out, but in implementation it doesn't work as I expect it.
Here is my little problem: I have strings that contains one or more names (team names). The problem is that if the string contains more than one, there is no separator. All names are joint directly.
Some examples :
------------String -----------------Contains----------Names to be extracted
- 'RangersIslandersDevils' --> 3 names ->>> [Rangers, Islanders, Devils]
- '49ersRaiders' -------------> 2 names ->>> [49ers, Raiders]
- 'Avalanche'----------------> 1 name ->>> [Avalanche]
- 'Red Wings'---------------> 1 name ->>> [Red Wings]
I want to capture each name in each string and use them in a loop later on. But I can't seem to implement the pattern I imagine for it.
The pattern implementation in my head for the strings are like this:
- Start scanning the text which is expected to start with a capital
letter or number - If you see a literal 's' followed by a capital letter (like ...s[A-Z]..) capture the text until "s" (including s)
- Repeat step two until you no more see (....s[A-Z]..) pattern. And capture the rest of the string as the last name.
- Optionally, Write all names in a list
Well I tried in vain some code in which the step two captures only one instance and step 3 normally gives another.
re.findall('([A-Z0-9].*s)*([A-Z].*)+', 'RangersIslandersMolsDevil')
That returns only two names:
[('RangersIslandersMols', 'Devil')]
whereas I want four:
[Rangers, Islanders, Mols, Devil]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
([[A-Z0-9]。*S)*
将尽可能多地捕获任何角色,因此导致“ Rangersislandersmols”将其陷入一场比赛。听起来像团队名称之间的边界被定义为小写字母(不一定是“ s”,如“雪崩”),然后立即带有大写字母或数字,因此我们的模式应寻找:
因为团队名称可以有多个单词,因此我们还为任何可能的单词数量寻找与上面相同的模式的空间。
尝试此模式:
([A-Z0-9].*s)*
will capture as many of any character as it can, so that's causing 'RangersIslandersMols' to get stuck together as one match.It sounds like the boundary between team names is defined as a lowercase letter (not necessarily an 's', as in 'Avalanche') followed immediately by an uppercase letter or number, so our pattern should look for:
Because a team name can have multiple words, we'll also look for a space followed by the same pattern as above, for any possible number of words.
Try this pattern: