正则表达式除以大写字母
我想用正则表达式将 'HDMWhoSomeThing'
等字符串替换为 'HDM Who Some Thing'
。
所以我想提取以大写字母开头或仅由大写字母组成的单词。请注意,在字符串 'HDMWho'
中,最后一个大写字母实际上是单词 Who
的第一个字母 - 并且不应包含在单词 中>HDM
。
实现此目标的正确正则表达式是什么?我尝试过许多类似于 [AZ][az]+
的正则表达式,但没有成功。 [AZ][az]+
给了我 'Who Some Thing'
- 当然没有 'HDM'
。
有什么想法吗? 谢谢, 鲁基
I would like to replace strings like 'HDMWhoSomeThing'
to 'HDM Who Some Thing'
with regex.
So I would like to extract words which starts with an upper-case letter or consist of upper-case letters only. Notice that in the string 'HDMWho'
the last upper-case letter is in the fact the first letter of the word Who
- and should not be included in the word HDM
.
What is the correct regex to achieve this goal? I have tried many regex' similar to [A-Z][a-z]+
but without success. The [A-Z][a-z]+
gives me 'Who Some Thing'
- without 'HDM'
of course.
Any ideas?
Thanks,
Rukki
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
输出:
从代码行来看,此任务更适合 <代码>re.findall:
输出:
Output:
Judging by lines of code, this task is a much more natural fit with
re.findall
:Output:
尝试使用以下正则表达式进行拆分:
如果您的正则表达式引擎不支持拆分空匹配项,请尝试使用此正则表达式在单词之间添加空格:
将其替换为
" $1"
(空格加匹配项第一组)。然后你就可以在空间上分开了。Try to split with this regular expression:
And if your regular expression engine does not support splitting empty matches, try this regular expression to put spaces between the words:
Replace it with
" $1"
(space plus match of the first group). Then you can split at the space.一行:
' '.join(a 或 b for a,b in re.findall('([AZ][az]+)|(?:([AZ]*)(?=[AZ]))', s))
使用正则表达式
([AZ][az]+)|(?:([AZ]*)(?=[AZ]))
one liner :
' '.join(a or b for a,b in re.findall('([A-Z][a-z]+)|(?:([A-Z]*)(?=[A-Z]))',s))
using regexp
([A-Z][a-z]+)|(?:([A-Z]*)(?=[A-Z]))
因此,在这种情况下,“单词”是:
所以尝试:
([AZ]+(?![az])|[AZ][az]*)
第一个交替包括负前瞻 (?![az]),它处理边界位于全大写单词和首字母大写单词之间。
So 'words' in this case are:
so try:
([A-Z]+(?![a-z])|[A-Z][a-z]*)
The first alternation includes a negative lookahead (?![a-z]), which handles the boundary between an all-caps word and an initial caps word.
可能是“[AZ]*?[AZ][az]+”?
编辑: 这似乎有效: [AZ]{2,}(?![az])|[AZ][az]+
打印出:
May be '[A-Z]*?[A-Z][a-z]+'?
Edit: This seems to work: [A-Z]{2,}(?![a-z])|[A-Z][a-z]+
Prints out: