正则表达式要达到长度，然后检查其他模式？

发布于 2025-02-09 06:10:51 字数 2880 浏览 1 评论 0原文

我不知道如何准确地在英语标题中表达这个问题，有2个规则

首先，在头部和结尾部分尽可能长时间符合给定的长度
然后匹配其他模式

，例如，

必须在数字之前读取2〜3个字符，并且如果字符串足够长，则必须在数字之后读取2〜4个字符；如果字符串不够长，请仅阅读可能
检查数字之前的字符是否不是a，而数字之后的字符不是z

------------------ --- 该代码确切是下表尝试表达的内容

import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}

reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"

for key, value in lst.items():
    match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
    if match:
        print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
    else:
        print(f'{key:15s} expected to be: {value:15s}, really get: ""')

- 以下描述是我现在没有对其进行编辑的旧描述，即

Text text text	feelist feelist	austmanation
abc123defg	abc123defg	首先在'abc123defg'中读取，其中`c`不要破坏`[^a]`，而`d`不会破坏`[^z]`。匹配
因此，'abc123defg'与babc123defg	abc123defg	，首先在'abc123defg'中读取，其中`c`不破坏`[^a]，而d d`确实不要破坏`[^z]`。匹配
因此，'abc123defg'与aba123defg	，没有	先阅读'abc123defg'，其中`a` break breaks `[^a]` d 不会破裂。 `[^z]`。因此，`'''`与
匹配	ABC123Zefg	。 z 确实破坏`[^z]`。因此，`''`匹配
BC123DEF	BC123DEF	首先在'BC123DEF'中读取，其中`c`不破坏`[^a]`和`> d`不会破坏`[^z]`。因此，'bc123def'匹配
C123DEF	C123DEF	首先读取'c123def'，其中`c`不破坏`[^a]` d d 确实不要破坏`[^z]`。匹配，
因此，'c123def'与c123zef	没有	先读取'c123def'，其中`c`不会破坏`[^a]`，而`z`确实断开`[^z]`。因此，''与
C123D	C123D	匹配，首先读取'c123d'，其中`c`不会破坏`[^a]`，而`d`不破坏断开`[^z]`。因此，“ C123D”是匹配的，

所以我在Python中写了正则表达式，

import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']

for text in lst:
    print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())

但是，由于没有期望的是，

abc123defg  ->  abc123defg
aba123defg  ->  aba123
abc123zefg  ->  abc123zef
bc123def  ->  bc123def

有什么方法可以通过正则表达式来满足期望？谢谢

原文

I don't know how to express the question in English title exactly, there are 2 rules

firstly, met the given length as long as possible on the head and end part
then match the other pattern

for example,

must read 2~3 chars before number and must read 2~4 chars after the number if the string is long enough; if the string is not long enough, read only possible
check whether the char before number is not a, and the char after number is not z

--- edit on 20220620 ----
the code is what exactly the following table tried to express

import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}

reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"

for key, value in lst.items():
    match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
    if match:
        print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
    else:
        print(f'{key:15s} expected to be: {value:15s}, really get: ""')

--- the following description is the old one which I did not edit it now

text	expected find	explanation
abc123defg	abc123defg	first read in 'abc123defg', in which `c` does not break `[^a]`, and `d` does not break `[^z]`. so 'abc123defg' is matched
babc123defg	abc123defg	first read in 'abc123defg', in which `c` does not break `[^a]`, and `d` does not break `[^z]`. so 'abc123defg' is matched
aba123defg	nothing	first read in 'abc123defg', in which `a` breaks `[^a]`, and `d` does not break `[^z]`. so `''` is matched
abc123zefg	nothing	first read in 'abc123defg', in which `c` does not break `[^a]`, but `z` does break `[^z]`. so `''` is matched
bc123def	bc123def	first read in 'bc123def', in which `c` does not break `[^a]`, and `d` does not break `[^z]`. so 'bc123def' is matched
c123def	c123def	first read in 'c123def', in which `c` does not break `[^a]`, and `d` does not break `[^z]`. so 'c123def' is matched
c123zef	nothing	first read in 'c123def', in which `c` does not break `[^a]`, and `z` does break `[^z]`. so '' is matched
c123d	c123d	first read in 'c123d', in which `c` does not break `[^a]`, and `d` does not break `[^z]`. so 'c123d' is matched

so I write the regular expression in Python

import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']

for text in lst:
    print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())

but, of cause, the answer is not expected

abc123defg  ->  abc123defg
aba123defg  ->  aba123
abc123zefg  ->  abc123zef
bc123def  ->  bc123def

So is there a way to meet the expectation with just regular expression? Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

滥情稳全场 2025-02-16 06:10:51

根据您的描述，我将Regexp更改为：

r".{1,2}[^a\d]\d+[^z\d].{1,3}"

重点是正确匹配数字：

至少一个数字：\ d+，而不是\ d*
知道数字之前和之后的内容，您需要仅通过上述\ d+匹配数字。这就是为什么我在之前和之后“添加” [^\ d]。

Based on your description, I changed the regexp to:

r".{1,2}[^a\d]\d+[^z\d].{1,3}"

The point is to match the number correctly:

at least one digit: \d+ instead of \d*
in order to exactly know what is before and after the number, you need to match the number only by the above mentioned \d+. That's why I "added" [^\d] before and after.

回复收藏 0 原文

七色彩虹 2025-02-16 06:10:51

这两个规则可以以单个正则模式包含。尝试使用此表达式：

regex = re.compile("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}")

简单而自我解释：

查找1到2个字母字符
找到另一个字母字符，不包括chars a | z
|
此后继续使用任何数字序列，找到另一个字母字符，不包括chars z
查找另外1到3个字母字符，

然后您可以通过预期的结果复制代码。

尝试条款将需要以避免无关的属性，因为该模式不匹配：

for text in lst:
try:
    print(text, ' -> ', re.match("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}", text).group())
except AttributeError:
    print(text, ' -> No pattern found')

The 2 rules can be encompassed in a single regex pattern. Try with this expression:

regex = re.compile("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}")

Simple and self-explanatory:

Find 1 to 2 alphabetic characters
Find one more alphabetic character, excluding chars A | a
Continue with any digits sequence
After that, find another alphabetic character, excluding chars Z | z
Find 1 to 3 more alphabetic characters

Then you will be able to reproduce your code with the expected results.

A try clause will be needed in order to avoid None related AttributeErrors, for the cases where the pattern is not matched:

for text in lst:
try:
    print(text, ' -> ', re.match("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}", text).group())
except AttributeError:
    print(text, ' -> No pattern found')

回复收藏 0 原文

~没有更多了~