正则表达式要达到长度,然后检查其他模式?
我不知道如何准确地在英语标题中表达这个问题,有2个规则
首先,在头部和结尾部分尽可能长时间符合给定的长度
- 然后匹配其他模式
,例如,
- 必须在数字之前读取2〜3个字符,并且如果字符串足够长,则必须在数字之后读取2〜4个字符;如果字符串不够长,请仅阅读可能
- 检查数字之前的字符是否不是
a
,而数字之后的字符不是z
------------------ --- 该代码确切是下表尝试表达的内容
import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}
reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"
for key, value in lst.items():
match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
if match:
print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
else:
print(f'{key:15s} expected to be: {value:15s}, really get: ""')
- 以下描述是我现在没有对其进行编辑的旧描述,即
Text text text | feelist feelist | austmanation |
---|---|---|
abc123defg | abc123defg | 首先在'abc123defg'中读取,其中c 不要破坏[^a] ,而d 不会破坏[^z] 。 匹配 |
因此,'abc123defg'与babc123defg | abc123defg | ,首先在'abc123defg'中读取,其中c 不破坏 确实不要破坏[^z] 。 匹配 |
因此,'abc123defg'与aba123defg | ,没有 | 先阅读'abc123defg',其中a break breaks [^a] d 不会破裂。 [^z] 。因此,''' 与 |
匹配 | ABC123Zefg | 。 z 确实破坏[^z] 。因此,'' 匹配 |
BC123DEF | BC123DEF | 首先在'BC123DEF'中读取,其中c 不破坏[^a] 和> d 不会破坏[^z] 。因此,'bc123def'匹配 |
C123DEF | C123DEF | 首先读取'c123def',其中c 不破坏[^a] d d 确实不要破坏[^z] 。 匹配, |
因此,'c123def'与c123zef | 没有 | 先读取'c123def',其中c 不会破坏[^a] ,而z 确实断开[^z] 。因此,''与 |
C123D | C123D | 匹配,首先读取'c123d',其中c 不会破坏[^a] ,而d 不破坏断开[^z] 。因此,“ C123D”是匹配的, |
所以我在Python中写了正则表达式,
import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']
for text in lst:
print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())
但是,由于没有期望的是,
abc123defg -> abc123defg
aba123defg -> aba123
abc123zefg -> abc123zef
bc123def -> bc123def
有什么方法可以通过正则表达式来满足期望?谢谢
I don't know how to express the question in English title exactly, there are 2 rules
firstly, met the given length as long as possible on the head and end part
then match the other pattern
for example,
- must read 2~3 chars before number and must read 2~4 chars after the number if the string is long enough; if the string is not long enough, read only possible
- check whether the char before number is not
a
, and the char after number is notz
--- edit on 20220620 ----
the code is what exactly the following table tried to express
import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}
reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"
for key, value in lst.items():
match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
if match:
print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
else:
print(f'{key:15s} expected to be: {value:15s}, really get: ""')
--- the following description is the old one which I did not edit it now
text | expected find | explanation |
---|---|---|
abc123defg | abc123defg | first read in 'abc123defg', in which c does not break [^a] , and d does not break [^z] . so 'abc123defg' is matched |
babc123defg | abc123defg | first read in 'abc123defg', in which c does not break [^a] , and d does not break [^z] . so 'abc123defg' is matched |
aba123defg | nothing | first read in 'abc123defg', in which a breaks [^a] , and d does not break [^z] . so '' is matched |
abc123zefg | nothing | first read in 'abc123defg', in which c does not break [^a] , but z does break [^z] . so '' is matched |
bc123def | bc123def | first read in 'bc123def', in which c does not break [^a] , and d does not break [^z] . so 'bc123def' is matched |
c123def | c123def | first read in 'c123def', in which c does not break [^a] , and d does not break [^z] . so 'c123def' is matched |
c123zef | nothing | first read in 'c123def', in which c does not break [^a] , and z does break [^z] . so '' is matched |
c123d | c123d | first read in 'c123d', in which c does not break [^a] , and d does not break [^z] . so 'c123d' is matched |
so I write the regular expression in Python
import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']
for text in lst:
print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())
but, of cause, the answer is not expected
abc123defg -> abc123defg
aba123defg -> aba123
abc123zefg -> abc123zef
bc123def -> bc123def
So is there a way to meet the expectation with just regular expression? Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据您的描述,我将Regexp更改为:
重点是正确匹配数字:
\ d+
,而不是\ d*
\ d+
匹配数字。这就是为什么我在之前和之后“添加”[^\ d]
。Based on your description, I changed the regexp to:
The point is to match the number correctly:
\d+
instead of\d*
\d+
. That's why I "added"[^\d]
before and after.这两个规则可以以单个正则模式包含。尝试使用此表达式:
简单而自我解释:
然后您可以通过预期的结果复制代码。
尝试
条款将需要以避免无关的属性,因为该模式不匹配:The 2 rules can be encompassed in a single regex pattern. Try with this expression:
Simple and self-explanatory:
Then you will be able to reproduce your code with the expected results.
A
try
clause will be needed in order to avoid None related AttributeErrors, for the cases where the pattern is not matched: