Python 模式匹配。匹配 'c[任意数量的连续 a、b、c 或 b、c、a 等。 ]t'

发布于 2024-11-19 15:19:59 字数 397 浏览 4 评论 0原文

对不起,我的标题,我无法想出一个干净的方式来问我的问题。

在Python中,我想匹配一个表达式'c[some stuff]t',其中[some stuff]可以是任意数量的连续a、b或c并且以任意顺序。

例如,这些工作: 'ct''猫''cbbt''caaabbct''cbbccaat'

但这些不会: 'cbcbbaat''caaccbabbt'

编辑:a、b 和 c 只是一个示例,但我真的希望能够将其扩展到更多字母。我对正则表达式和非正则表达式解决方案感兴趣。

Sorry about the title, I couldn't come up with a clean way to ask my question.

In Python I would like to match an expression 'c[some stuff]t', where [some stuff] could be any number of consecutive a's, b's, or c's and in any order.

For example, these work:
'ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat'

but these don't:
'cbcbbaat', 'caaccbabbt'

Edit: a's, b's, and c's are just an example but I would really like to be able to extend this to more letters. I'm interested in regex and non-regex solutions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

薄荷港 2024-11-26 15:19:59

尚未经过彻底测试,但我认为这应该可行:

import re

words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat',  'cbcbbaat', 'caaccbabbt']
pat = re.compile(r'^c(?:([abc])\1*(?!.*\1))*t

这与 abc 的运行相匹配(即 ([abc ])\1* 部分),而负前瞻 (?!.*\1) 确保运行后不存在该字符的其他实例。

(编辑:修复了解释中的拼写错误)

) for w in words: print w, "matches" if pat.match(w) else "doesn't match" #ct matches #cat matches #cbbt matches #caaabbct matches #cbbccaat matches #cbcbbaat doesn't match #caaccbabbt doesn't match

这与 abc 的运行相匹配(即 ([abc ])\1* 部分),而负前瞻 (?!.*\1) 确保运行后不存在该字符的其他实例。

(编辑:修复了解释中的拼写错误)

Not thoroughly tested, but I think this should work:

import re

words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat',  'cbcbbaat', 'caaccbabbt']
pat = re.compile(r'^c(?:([abc])\1*(?!.*\1))*t

This matches runs of a, b or c (that's the ([abc])\1* part), while the negative lookahead (?!.*\1) makes sure no other instance of that character is present after the run.

(edit: fixed a typo in the explanation)

) for w in words: print w, "matches" if pat.match(w) else "doesn't match" #ct matches #cat matches #cbbt matches #caaabbct matches #cbbccaat matches #cbcbbaat doesn't match #caaccbabbt doesn't match

This matches runs of a, b or c (that's the ([abc])\1* part), while the negative lookahead (?!.*\1) makes sure no other instance of that character is present after the run.

(edit: fixed a typo in the explanation)

愿得七秒忆 2024-11-26 15:19:59

不确定您对正则表达式的重视程度,但这里有一个使用不同方法的解决方案:

from itertools import groupby

words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat',  'cbcbbaat', 'caaccbabbt']
for w in words:
    match = False
    if w.startswith('c') and w.endswith('t'):
        temp = w[1:-1]
        s = set(temp)
        match = s <= set('abc') and len(s) == len(list(groupby(temp)))
    print w, "matches" if match else "doesn't match"

如果一组中间字符是 set('abc') 的子集,则字符串匹配,并且字符串的数量groupby() 返回的组与集合中的元素数量相同。

Not sure how attached you are to regex, but here is a solution using a different method:

from itertools import groupby

words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat',  'cbcbbaat', 'caaccbabbt']
for w in words:
    match = False
    if w.startswith('c') and w.endswith('t'):
        temp = w[1:-1]
        s = set(temp)
        match = s <= set('abc') and len(s) == len(list(groupby(temp)))
    print w, "matches" if match else "doesn't match"

The string matches if a set of the middle characters is a subset of set('abc') and the number of groups returned by groupby() is the same as the number of elements in the set.

简单 2024-11-26 15:19:59

我相信您需要显式编码 as、bs 和 cs 的所有可能排列:

c(a*b*c*|b*a*c*|b*c*a*|c*b*a*|c*a*b*|a*c*b*)t

请注意,这是一个效率极低的查询,可能会走回头路很多。

I believe you need to explicitly encode all possible permutations of as, bs and cs:

c(a*b*c*|b*a*c*|b*c*a*|c*b*a*|c*a*b*|a*c*b*)t

Note that this is an extremely inefficient query which may backtrack a lot.

叹倦 2024-11-26 15:19:59

我不知道 Python 正则表达式引擎,但听起来你只是想直接写出 6 种不同的可能顺序。

/c(a*b*c*|a*c*b*|b*a*c*|b*c*a*|c*a*b*|c*b*a*)t/

I don't know the Python regex engine, but it sounds like you just want to write out the 6 different possible orderings directly.

/c(a*b*c*|a*c*b*|b*a*c*|b*c*a*|c*a*b*|c*b*a*)t/
为你鎻心 2024-11-26 15:19:59

AFAIK 没有“紧凑”的方法来做到这一点......

c(a*(b*c*|c*b*)|b*(a*c*|c*a*)|c*(a*b*|b*a*))t

AFAIK there's no "compact" way of doing this...

c(a*(b*c*|c*b*)|b*(a*c*|c*a*)|c*(a*b*|b*a*))t
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文