正则表达式除以大写字母

发布于 2024-08-21 06:48:12 字数 412 浏览 6 评论 0原文

我想用正则表达式将 'HDMWhoSomeThing' 等字符串替换为 'HDM Who Some Thing'

所以我想提取以大写字母开头或仅由大写字母组成的单词。请注意,在字符串 'HDMWho' 中,最后一个大写字母实际上是单词 Who 的第一个字母 - 并且不应包含在单词 中>HDM

实现此目标的正确正则表达式是什么?我尝试过许多类似于 [AZ][az]+ 的正则表达式,但没有成功。 [AZ][az]+ 给了我 'Who Some Thing' - 当然没有 'HDM'

有什么想法吗? 谢谢, 鲁基

I would like to replace strings like 'HDMWhoSomeThing' to 'HDM Who Some Thing' with regex.

So I would like to extract words which starts with an upper-case letter or consist of upper-case letters only. Notice that in the string 'HDMWho' the last upper-case letter is in the fact the first letter of the word Who - and should not be included in the word HDM.

What is the correct regex to achieve this goal? I have tried many regex' similar to [A-Z][a-z]+ but without success. The [A-Z][a-z]+ gives me 'Who Some Thing' - without 'HDM' of course.

Any ideas?
Thanks,
Rukki

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

绝情姑娘 2024-08-28 06:48:12
#! /usr/bin/env python

import re
from collections import deque

pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z](?=[a-z]|$))'
chunks = deque(re.split(pattern, 'HDMWhoSomeMONKEYThingXYZ'))

result = []
while len(chunks):
  buf = chunks.popleft()
  if len(buf) == 0:
    continue
  if re.match(r'^[A-Z]

输出:

HDM Who Some MONKEY Thing XYZ

从代码行来看,此任务更适合 <代码>re.findall

pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z][a-z]*)'
print ' '.join(re.findall(pattern, 'HDMWhoSomeMONKEYThingX'))

输出:

HDM Who Some MONKEY Thing X
, buf) and len(chunks): buf += chunks.popleft() result.append(buf) print ' '.join(result)

输出:

从代码行来看,此任务更适合 <代码>re.findall

输出:

#! /usr/bin/env python

import re
from collections import deque

pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z](?=[a-z]|$))'
chunks = deque(re.split(pattern, 'HDMWhoSomeMONKEYThingXYZ'))

result = []
while len(chunks):
  buf = chunks.popleft()
  if len(buf) == 0:
    continue
  if re.match(r'^[A-Z]

Output:

HDM Who Some MONKEY Thing XYZ

Judging by lines of code, this task is a much more natural fit with re.findall:

pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z][a-z]*)'
print ' '.join(re.findall(pattern, 'HDMWhoSomeMONKEYThingX'))

Output:

HDM Who Some MONKEY Thing X
, buf) and len(chunks): buf += chunks.popleft() result.append(buf) print ' '.join(result)

Output:

Judging by lines of code, this task is a much more natural fit with re.findall:

Output:

神仙妹妹 2024-08-28 06:48:12

尝试使用以下正则表达式进行拆分:

/(?=[A-Z][a-z])/

如果您的正则表达式引擎不支持拆分空匹配项,请尝试使用此正则表达式在单词之间添加空格:

/([A-Z])(?![A-Z])/

将其替换为 " $1" (空格加匹配项第一组)。然后你就可以在空间上分开了。

Try to split with this regular expression:

/(?=[A-Z][a-z])/

And if your regular expression engine does not support splitting empty matches, try this regular expression to put spaces between the words:

/([A-Z])(?![A-Z])/

Replace it with " $1" (space plus match of the first group). Then you can split at the space.

情深缘浅 2024-08-28 06:48:12

一行:

' '.join(a 或 b for a,b in re.findall('([AZ][az]+)|(?:([AZ]*)(?=[AZ]))', s))

使用正则表达式

([AZ][az]+)|(?:([AZ]*)(?=[AZ]))

one liner :

' '.join(a or b for a,b in re.findall('([A-Z][a-z]+)|(?:([A-Z]*)(?=[A-Z]))',s))

using regexp

([A-Z][a-z]+)|(?:([A-Z]*)(?=[A-Z]))

划一舟意中人 2024-08-28 06:48:12

因此,在这种情况下,“单词”是:

  1. 任意数量的大写字母 - 除非最后一个大写字母后面跟着一个小写字母。
  2. 一个大写字母后跟任意数量的小写字母。

所以尝试:

([AZ]+(?![az])|[AZ][az]*)

第一个交替包括负前瞻 (?![az]),它处理边界位于全大写单词和首字母大写单词之间。

So 'words' in this case are:

  1. Any number of uppercase letters - unless the last uppercase letter is followed by a lowercase letter.
  2. One uppercase letter followed by any number of lowercase letters.

so try:

([A-Z]+(?![a-z])|[A-Z][a-z]*)

The first alternation includes a negative lookahead (?![a-z]), which handles the boundary between an all-caps word and an initial caps word.

临走之时 2024-08-28 06:48:12

可能是“[AZ]*?[AZ][az]+”?

编辑: 这似乎有效: [AZ]{2,}(?![az])|[AZ][az]+

import re

def find_stuff(str):
  p = re.compile(r'[A-Z]{2,}(?![a-z])|[A-Z][a-z]+')
  m = p.findall(str)
  result = ''
  for x in m:
    result += x + ' '
  print result

find_stuff('HDMWhoSomeThing')
find_stuff('SomeHDMWhoThing')

打印出:

HDM 谁有些事

一些 HDM Who Thing

May be '[A-Z]*?[A-Z][a-z]+'?

Edit: This seems to work: [A-Z]{2,}(?![a-z])|[A-Z][a-z]+

import re

def find_stuff(str):
  p = re.compile(r'[A-Z]{2,}(?![a-z])|[A-Z][a-z]+')
  m = p.findall(str)
  result = ''
  for x in m:
    result += x + ' '
  print result

find_stuff('HDMWhoSomeThing')
find_stuff('SomeHDMWhoThing')

Prints out:

HDM Who Some Thing

Some HDM Who Thing

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文