正则表达式要达到长度,然后检查其他模式?

发布于 2025-02-09 06:10:51 字数 2880 浏览 1 评论 0原文

我不知道如何准确地在英语标题中表达这个问题,有2个规则

  1. 首先,在头部和结尾部分尽可能长时间符合给定的长度

  2. 然后匹配其他模式

,例如,

  1. 必须在数字之前读取2〜3个字符,并且如果字符串足够长,则必须在数字之后读取2〜4个字符;如果字符串不够长,请仅阅读可能
  2. 检查数字之前的字符是否不是a,而数字之后的字符不是z

------------------ --- 该代码确切是下表尝试表达的内容

import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}

reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"

for key, value in lst.items():
    match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
    if match:
        print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
    else:
        print(f'{key:15s} expected to be: {value:15s}, really get: ""')

- 以下描述是我现在没有对其进行编辑的旧描述,即

Text text textfeelist feelistaustmanation
abc123defgabc123defg首先在'abc123defg'中读取,其中c不要破坏[^a],而d不会破坏[^z]。 匹配
因此,'abc123defg'与babc123defgabc123defg,首先在'abc123defg'中读取,其中c不破坏[^a],而d d确实不要破坏[^z]。 匹配
因此,'abc123defg'与aba123defg,没有先阅读'abc123defg',其中a break breaks [^a] d 不会破裂。 [^z]。因此,'''
匹配ABC123Zefg。 z 确实破坏[^z]。因此,''匹配
BC123DEFBC123DEF首先在'BC123DEF'中读取,其中c不破坏[^a]> d不会破坏[^z]。因此,'bc123def'匹配
C123DEFC123DEF首先读取'c123def',其中c不破坏[^a] d d 确实不要破坏[^z]。 匹配,
因此,'c123def'与c123zef没有先读取'c123def',其中c不会破坏[^a],而z确实断开[^z]。因此,''与
C123DC123D匹配,首先读取'c123d',其中c不会破坏[^a],而d不破坏断开[^z]。因此,“ C123D”是匹配的,

所以我在Python中写了正则表达式,

import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']

for text in lst:
    print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())

但是,由于没有期望的是,

abc123defg  ->  abc123defg
aba123defg  ->  aba123
abc123zefg  ->  abc123zef
bc123def  ->  bc123def

有什么方法可以通过正则表达式来满足期望?谢谢

I don't know how to express the question in English title exactly, there are 2 rules

  1. firstly, met the given length as long as possible on the head and end part

  2. then match the other pattern

for example,

  1. must read 2~3 chars before number and must read 2~4 chars after the number if the string is long enough; if the string is not long enough, read only possible
  2. check whether the char before number is not a, and the char after number is not z

--- edit on 20220620 ----
the code is what exactly the following table tried to express

import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}

reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"

for key, value in lst.items():
    match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
    if match:
        print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
    else:
        print(f'{key:15s} expected to be: {value:15s}, really get: ""')

--- the following description is the old one which I did not edit it now

textexpected findexplanation
abc123defgabc123defgfirst read in 'abc123defg', in which c does not break [^a], and d does not break [^z]. so 'abc123defg' is matched
babc123defgabc123defgfirst read in 'abc123defg', in which c does not break [^a], and d does not break [^z]. so 'abc123defg' is matched
aba123defgnothingfirst read in 'abc123defg', in which a breaks [^a], and d does not break [^z]. so '' is matched
abc123zefgnothingfirst read in 'abc123defg', in which c does not break [^a], but z does break [^z]. so '' is matched
bc123defbc123deffirst read in 'bc123def', in which c does not break [^a], and d does not break [^z]. so 'bc123def' is matched
c123defc123deffirst read in 'c123def', in which c does not break [^a], and d does not break [^z]. so 'c123def' is matched
c123zefnothingfirst read in 'c123def', in which c does not break [^a], and z does break [^z]. so '' is matched
c123dc123dfirst read in 'c123d', in which c does not break [^a], and d does not break [^z]. so 'c123d' is matched

so I write the regular expression in Python

import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']

for text in lst:
    print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())

but, of cause, the answer is not expected

abc123defg  ->  abc123defg
aba123defg  ->  aba123
abc123zefg  ->  abc123zef
bc123def  ->  bc123def

So is there a way to meet the expectation with just regular expression? Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

滥情稳全场 2025-02-16 06:10:51

根据您的描述,我将Regexp更改为:

r".{1,2}[^a\d]\d+[^z\d].{1,3}"

重点是正确匹配数字:

  • 至少一个数字:\ d+,而不是\ d*
  • 知道数字之前和之后的内容,您需要仅通过上述\ d+匹配数字。这就是为什么我在之前和之后“添加” [^\ d]

Based on your description, I changed the regexp to:

r".{1,2}[^a\d]\d+[^z\d].{1,3}"

The point is to match the number correctly:

  • at least one digit: \d+ instead of \d*
  • in order to exactly know what is before and after the number, you need to match the number only by the above mentioned \d+. That's why I "added" [^\d] before and after.
七色彩虹 2025-02-16 06:10:51

这两个规则可以以单个正则模式包含。尝试使用此表达式:

regex = re.compile("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}")

简单而自我解释:

  • 查找1到2个字母字符
  • 找到另一个字母字符,不包括chars a | z
  • |
  • 此后继续使用任何数字序列,找到另一个字母字符,不包括chars z
  • 查找另外1到3个字母字符,

然后您可以通过预期的结果复制代码。

尝试条款将需要以避免无关的属性,因为该模式不匹配:

for text in lst:
try:
    print(text, ' -> ', re.match("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}", text).group())
except AttributeError:
    print(text, ' -> No pattern found')

The 2 rules can be encompassed in a single regex pattern. Try with this expression:

regex = re.compile("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}")

Simple and self-explanatory:

  • Find 1 to 2 alphabetic characters
  • Find one more alphabetic character, excluding chars A | a
  • Continue with any digits sequence
  • After that, find another alphabetic character, excluding chars Z | z
  • Find 1 to 3 more alphabetic characters

Then you will be able to reproduce your code with the expected results.

A try clause will be needed in order to avoid None related AttributeErrors, for the cases where the pattern is not matched:

for text in lst:
try:
    print(text, ' -> ', re.match("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}", text).group())
except AttributeError:
    print(text, ' -> No pattern found')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文