需要帮助找到正确的正则表达式匹配模式
我无法在 python 中找到有效的正则表达式来分割这些字符串:
CAT One | desired: CAT
DOG SILVER FOX Two | desired: DOG SILVER FOX
KING KONG | desired: KING KONG
P'OT THEN Mark First | desired P'OT THEN
只是愚蠢的例子,但我需要将全大写的单词与仅大写的单词分开。
我可以使用 {1,n}
大写单词和 {0,n}
大写单词。
我的正则表达式太奇怪了,我捕获了所有字符串或仅捕获了一个大写单词..
I can't find a working regex in python to split these strings:
CAT One | desired: CAT
DOG SILVER FOX Two | desired: DOG SILVER FOX
KING KONG | desired: KING KONG
P'OT THEN Mark First | desired P'OT THEN
Just stupid examples, but i need to separate words that are full uppercase from words that are only capitalized.
I could have {1,n}
uppercase words and {0,n}
capitalized words.
My regexs were too weird, i catch all the string or only one uppercase word..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
输出:
解释:
\b[AZ]+
表示:匹配一个或多个大写字母,但仅匹配单词的开头。这将匹配“YELLOW”,但不匹配“horsE”中的“E”。\W*[AZ]+
表示:匹配零个或多个非单词字符,后跟一个或多个大写字母。这将匹配“'OT”或“-BAR”或“KONG”。(?:\W*[AZ]+)*\b
表示:创建一个匹配零次或多次的(非捕获)组,但仅在单词末尾。这将匹配“SILVER FOX”,但不匹配其后面的“T”。Output:
Explanation:
\b[A-Z]+
means: match one or more capital letters, but only at the start of a word. This will match "YELLOW", but not the "E" in "horsE".\W*[A-Z]+
means: match zero or more non-word characters, followed by one or more capital letters. This will match "'OT" or "-BAR" or " KONG".(?:\W*[A-Z]+)*\b
means: make a (non-capturing) group which matches zero or more times, but only at the end of a word. This will match " SILVER FOX", but not the " T" which follows it.非正则表达式解决方案:
给出:
A non regex solution:
Gives:
非小写字母或空格后不跟小写字母(因此大写字母加数字加符号)
或
空格后不跟(可选的大写字母)和小写字母。
不清楚是否应该在前面添加
^
,因为大写单词始终位于前面。(我们忽略空格作为第一个字符的情况。因此没有
(space)KING KONG
。如果您想包含它,请在之后添加
,如^
>|^ (?![AZ]?[az])
)Non-lower case letter or space not followed by lower case letter (so upper case letters plus digits plus symbols)
OR
space not followed by (optional Upper case letter) and lower case letter.
It isn't clear if you should pre-pend a
^
because the upper case words are always first.(we are ignoring the case of space as a first character here. so no
(space)KING KONG
. If you want to include it, put a^
after the|
, like^ (?![A-Z]?[a-z])
)你应该能够以消极的眼光来解决这个问题。您扫描大写 NOT 后跟小写
[AZ']
是您要匹配的字符范围,如果您需要更多标点符号,只需'
只需将它们添加到此范围。You should be able to sort this with a negative look ahead. You scan for Uppercase NOT followed by a lowercase
[A-Z']
is the range of characters you are matching, if you need more punctuation then just'
simply add them to this range.