正则表达式和一系列模式？

发布于 2024-07-25 19:38:25 字数 1231 浏览 10 评论 0原文

有没有办法多次匹配一个模式（e\d\d），将每个模式捕获到一组中？例如，给定字符串..

blah.s01e24e25

..我希望得到四个组：

1 -> blah
2 -> 01
3 -> 24
4 -> 25

要使用的明显正则表达式是（在Python正则表达式中：

import re
re.match("(\w+).s(\d+)e(\d+)e(\d+)", "blah.s01e24e25").groups()

..但我也想匹配以下任一内容：

blah.s01e24
blah.s01e24e25e26

你似乎不能做(e\d\d)+，或者更确切地说，您可以，但它只捕获最后一次出现的情况：

>>> re.match("(\w+).s(\d+)(e\d\d){2}", "blah.s01e24e25e26").groups()
('blah', '01', 'e25')
>>> re.match("(\w+).s(\d+)(e\d\d){3}", "blah.s01e24e25e26").groups()
('blah', '01', 'e26')

我想在单个正则表达式中执行此操作，因为我有多个模式来匹配电视剧集文件名，并且不想要复制每个表达式来处理多个情节：

\w+\.s(\d+)\.e(\d+) # matches blah.s01e01
\w+\.s(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02
\w+\.s(\d+)\.e(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02e03

\w - \d+x\d+ # matches blah - 01x01
\w - \d+x\d+\d+ # matches blah - 01x01x02
\w - \d+x\d+\d+\d+ # matches blah - 01x01x02x03

..等等许多其他模式

使事情变得复杂 - 我希望将这些正则表达式存储在配置文件中，因此不需要使用多个正则表达式和函数调用的解决方案 -但如果这被证明是不可能的，我将只允许用户添加简单的正则表达式

基本上，有没有办法使用正则表达式捕获重复模式？

原文

Is there a way to match a pattern (e\d\d) several times, capturing each one into a group? For example, given the string..

blah.s01e24e25

..I wish to get four groups:

1 -> blah
2 -> 01
3 -> 24
4 -> 25

The obvious regex to use is (in Python regex:

import re
re.match("(\w+).s(\d+)e(\d+)e(\d+)", "blah.s01e24e25").groups()

..but I also want to match either of the following:

blah.s01e24
blah.s01e24e25e26

You can't seem to do (e\d\d)+, or rather you can, but it only captures the last occurrence:

>>> re.match("(\w+).s(\d+)(e\d\d){2}", "blah.s01e24e25e26").groups()
('blah', '01', 'e25')
>>> re.match("(\w+).s(\d+)(e\d\d){3}", "blah.s01e24e25e26").groups()
('blah', '01', 'e26')

I want to do this in a single regex because I have multiple patterns to match TV episode filenames, and do not want to duplicate each expression to handle multiple episodes:

\w+\.s(\d+)\.e(\d+) # matches blah.s01e01
\w+\.s(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02
\w+\.s(\d+)\.e(\d+)\.e(\d+)\.e(\d+) # matches blah.s01e01e02e03

\w - \d+x\d+ # matches blah - 01x01
\w - \d+x\d+\d+ # matches blah - 01x01x02
\w - \d+x\d+\d+\d+ # matches blah - 01x01x02x03

..and so on for numerous other patterns.

Another thing to complicate matters - I wish to store these regexs in a config file, so a solution using multiple regexs and function calls is not desired - but if this proves impossible I'll just allow the user to add simple regexs

Basically, is there a way to capture a repeating pattern using regex?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

彼岸花ソ最美的依靠 2024-08-01 19:38:25

分两步进行，第一步找到所有数字，然后一步拆分它们：

import re

def get_pieces(s):
    # Error checking omitted!
    whole_match = re.search(r'\w+\.(s\d+(?:e\d+)+)', s)
    return re.findall(r'\d+', whole_match.group(1))

print get_pieces(r"blah.s01e01")
print get_pieces(r"blah.s01e01e02")
print get_pieces(r"blah.s01e01e02e03")

# prints:
# ['01', '01']
# ['01', '01', '02']
# ['01', '01', '02', '03']

Do it in two steps, one to find all the numbers, then one to split them:

import re

def get_pieces(s):
    # Error checking omitted!
    whole_match = re.search(r'\w+\.(s\d+(?:e\d+)+)', s)
    return re.findall(r'\d+', whole_match.group(1))

print get_pieces(r"blah.s01e01")
print get_pieces(r"blah.s01e01e02")
print get_pieces(r"blah.s01e01e02e03")

# prints:
# ['01', '01']
# ['01', '01', '02']
# ['01', '01', '02', '03']

回复收藏 0 原文

可爱暴击 2024-08-01 19:38:25

捕获组的数量等于括号组的数量。查看 findall 或 finditer 来解决您的问题。

回复收藏 0 原文

会发光的星星闪亮亮i 2024-08-01 19:38:25

非分组括号：
(?:asdfasdg)

不必出现：
（？：adsfasdf）？

c = re.compile(r"""(\w+).s(\d+)
                       (?:
                            e(\d+)
                            (?:
                                  e(\d+)
                            )?
                        )?
               """, re.X)

或者

c = re.compile(r"""(\w+).s(\d+)(?:e(\d+)(?:e(\d+))?)?""", re.X)

non-grouping parentheses:
(?:asdfasdg)

which do not have to appear:
(?:adsfasdf)?

c = re.compile(r"""(\w+).s(\d+)
                       (?:
                            e(\d+)
                            (?:
                                  e(\d+)
                            )?
                        )?
               """, re.X)

c = re.compile(r"""(\w+).s(\d+)(?:e(\d+)(?:e(\d+))?)?""", re.X)

回复收藏 0 原文

属性 2024-08-01 19:38:25

经过思考这个问题，我认为我有一个更简单的解决方案，使用命名组。

用户（或我）可以使用的最简单的正则表达式是：

(\w+\).s(\d+)\.e(\d+)

文件名解析类将第一组作为节目名称，第二组作为季号，第三组作为剧集号。这涵盖了大多数文件。

我将允许使用几个不同的命名组：

(?P<showname>\w+\).s(?P<seasonnumber>\d+)\.e(?P<episodenumber>\d+)

为了支持多个剧集，我将支持两个命名组，例如 startingepisodenumber 和 endingepisodenumber 来支持诸如 之类的内容>showname.s01e01-03：

(?P<showname>\w+\)\.s(?P<seasonnumber>\d+)\.e(?P<startingepisodenumber>\d+)-(?P<endingepisodenumber>e\d+)

最后，允许命名组的名称与episodenumber\d+匹配（episodenumber1、episodenumber2等）：

(?P<showname>\w+\)\.
s(?P<seasonnumber>\d+)\.
e(?P<episodenumber1>\d+)
e(?P<episodenumber2>\d+)
e(?P<episodenumber3>\d+)

它仍然可能需要复制不同数量的 e01 的模式，但永远不会有一个包含两个非连续剧集的文件（如 show.s01e01e03e04），所以使用 starting/endingepisodenumber 组应该可以解决这个问题，对于用户遇到的奇怪情况，他们可以使用 episodenumber\d+ 组名称

这并不能真正回答序列 - of-patterns 问题，但它解决了我提出这个问题的问题！（我仍然会接受另一个答案，它展示了如何在一个正则表达式中匹配 s01e23e24...e27 - 如果有人解决了这个问题！）

After thinking about the problem, I think I have a simpler solution, using named groups.

The simplest regex a user (or I) could use is:

(\w+\).s(\d+)\.e(\d+)

The filename parsing class will take the first group as the show name, second as season number, third as episode number. This covers a majority of files.

I'll allow a few different named groups for these:

(?P<showname>\w+\).s(?P<seasonnumber>\d+)\.e(?P<episodenumber>\d+)

To support multiple episodes, I'll support two named groups, something like startingepisodenumber and endingepisodenumber to support things like showname.s01e01-03:

(?P<showname>\w+\)\.s(?P<seasonnumber>\d+)\.e(?P<startingepisodenumber>\d+)-(?P<endingepisodenumber>e\d+)

And finally, allow named groups with names matching episodenumber\d+ (episodenumber1, episodenumber2 etc):

(?P<showname>\w+\)\.
s(?P<seasonnumber>\d+)\.
e(?P<episodenumber1>\d+)
e(?P<episodenumber2>\d+)
e(?P<episodenumber3>\d+)

It still requires possibly duplicating the patterns for different amounts of e01s, but there will never be a file with two non-consecutive episodes (like show.s01e01e03e04), so using the starting/endingepisodenumber groups should solve this, and for weird cases users come across, they can use the episodenumber\d+ group names

This doesn't really answer the sequence-of-patterns question, but it solves the problem that led me to ask it! (I'll still accept another answer that shows how to match s01e23e24...e27 in one regex - if someone works this out!)

回复收藏 0 原文

帅哥哥的热头脑 2024-08-01 19:38:25

也许类似的事情？

def episode_matcher(filename):
    m1= re.match(r"(?i)(.*?)\.s(\d+)((?:e\d+)+)", filename)
    if m1:
        m2= re.findall(r"\d+", m1.group(3))
        return m1.group(1), m1.group(2), m2
    # auto return None here

>>> episode_matcher("blah.s01e02")
('blah', '01', ['02'])
>>> episode_matcher("blah.S01e02E03")
('blah', '01', ['02', '03'])

Perhaps something like that?

def episode_matcher(filename):
    m1= re.match(r"(?i)(.*?)\.s(\d+)((?:e\d+)+)", filename)
    if m1:
        m2= re.findall(r"\d+", m1.group(3))
        return m1.group(1), m1.group(2), m2
    # auto return None here

>>> episode_matcher("blah.s01e02")
('blah', '01', ['02'])
>>> episode_matcher("blah.S01e02E03")
('blah', '01', ['02', '03'])

回复收藏 0 原文

~没有更多了~