正则表达式和一系列模式?
有没有办法多次匹配一个模式(e\d\d
),将每个模式捕获到一组中? 例如,给定字符串..
blah.s01e24e25
..我希望得到四个组:
1 -> blah
2 -> 01
3 -> 24
4 -> 25
要使用的明显正则表达式是(在Python正则表达式中:
import re
re.match("(\w+).s(\d+)e(\d+)e(\d+)", "blah.s01e24e25").groups()
..但我也想匹配以下任一内容:
blah.s01e24
blah.s01e24e25e26
你似乎不能做(e\d\d)+
,或者更确切地说,您可以,但它只捕获最后一次出现的情况:
>>> re.match("(\w+).s(\d+)(e\d\d){2}", "blah.s01e24e25e26").groups()
('blah', '01', 'e25')
>>> re.match("(\w+).s(\d+)(e\d\d){3}", "blah.s01e24e25e26").groups()
('blah', '01', 'e26')
我想在单个正则表达式中执行此操作,因为我有多个模式来匹配电视剧集文件名,并且不想要复制每个表达式来处理多个情节:
\w+\.s(\d+)\.e(\d+) # matches blah.s01e01
\w+\.s(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02
\w+\.s(\d+)\.e(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02e03
\w - \d+x\d+ # matches blah - 01x01
\w - \d+x\d+\d+ # matches blah - 01x01x02
\w - \d+x\d+\d+\d+ # matches blah - 01x01x02x03
..等等许多其他模式
使事情变得复杂 - 我希望将这些正则表达式存储在配置文件中,因此不需要使用多个正则表达式和函数调用的解决方案 -但如果这被证明是不可能的,我将只允许用户添加简单的正则表达式
基本上,有没有办法使用正则表达式捕获重复模式?
Is there a way to match a pattern (e\d\d
) several times, capturing each one into a group? For example, given the string..
blah.s01e24e25
..I wish to get four groups:
1 -> blah
2 -> 01
3 -> 24
4 -> 25
The obvious regex to use is (in Python regex:
import re
re.match("(\w+).s(\d+)e(\d+)e(\d+)", "blah.s01e24e25").groups()
..but I also want to match either of the following:
blah.s01e24
blah.s01e24e25e26
You can't seem to do (e\d\d)+
, or rather you can, but it only captures the last occurrence:
>>> re.match("(\w+).s(\d+)(e\d\d){2}", "blah.s01e24e25e26").groups()
('blah', '01', 'e25')
>>> re.match("(\w+).s(\d+)(e\d\d){3}", "blah.s01e24e25e26").groups()
('blah', '01', 'e26')
I want to do this in a single regex because I have multiple patterns to match TV episode filenames, and do not want to duplicate each expression to handle multiple episodes:
\w+\.s(\d+)\.e(\d+) # matches blah.s01e01
\w+\.s(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02
\w+\.s(\d+)\.e(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02e03
\w - \d+x\d+ # matches blah - 01x01
\w - \d+x\d+\d+ # matches blah - 01x01x02
\w - \d+x\d+\d+\d+ # matches blah - 01x01x02x03
..and so on for numerous other patterns.
Another thing to complicate matters - I wish to store these regexs in a config file, so a solution using multiple regexs and function calls is not desired - but if this proves impossible I'll just allow the user to add simple regexs
Basically, is there a way to capture a repeating pattern using regex?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
分两步进行,第一步找到所有数字,然后一步拆分它们:
Do it in two steps, one to find all the numbers, then one to split them:
捕获组的数量等于括号组的数量。 查看
findall
或finditer
来解决您的问题。Number of captured groups equal to number of parentheses groups. Look at
findall
orfinditer
for solving your problem.非分组括号:
(?:asdfasdg)
不必出现:
(?:adsfasdf)?
或者
non-grouping parentheses:
(?:asdfasdg)
which do not have to appear:
(?:adsfasdf)?
or
经过思考这个问题,我认为我有一个更简单的解决方案,使用命名组。
用户(或我)可以使用的最简单的正则表达式是:
文件名解析类将第一组作为节目名称,第二组作为季号,第三组作为剧集号。 这涵盖了大多数文件。
我将允许使用几个不同的命名组:
为了支持多个剧集,我将支持两个命名组,例如
startingepisodenumber
和endingepisodenumber
来支持诸如之类的内容>showname.s01e01-03
:最后,允许命名组的名称与
episodenumber\d+
匹配(episodenumber1
、episodenumber2
等) :它仍然可能需要复制不同数量的
e01
的模式,但永远不会有一个包含两个非连续剧集的文件(如show.s01e01e03e04
),所以使用starting/endingepisodenumber
组应该可以解决这个问题,对于用户遇到的奇怪情况,他们可以使用episodenumber\d+
组名称这并不能真正回答序列 - of-patterns 问题,但它解决了我提出这个问题的问题! (我仍然会接受另一个答案,它展示了如何在一个正则表达式中匹配
s01e23e24...e27
- 如果有人解决了这个问题!)After thinking about the problem, I think I have a simpler solution, using named groups.
The simplest regex a user (or I) could use is:
The filename parsing class will take the first group as the show name, second as season number, third as episode number. This covers a majority of files.
I'll allow a few different named groups for these:
To support multiple episodes, I'll support two named groups, something like
startingepisodenumber
andendingepisodenumber
to support things likeshowname.s01e01-03
:And finally, allow named groups with names matching
episodenumber\d+
(episodenumber1
,episodenumber2
etc):It still requires possibly duplicating the patterns for different amounts of
e01
s, but there will never be a file with two non-consecutive episodes (likeshow.s01e01e03e04
), so using thestarting/endingepisodenumber
groups should solve this, and for weird cases users come across, they can use theepisodenumber\d+
group namesThis doesn't really answer the sequence-of-patterns question, but it solves the problem that led me to ask it! (I'll still accept another answer that shows how to match
s01e23e24...e27
in one regex - if someone works this out!)也许类似的事情?
Perhaps something like that?