如何使用正则表达式对其进行标记?

发布于 2024-08-04 05:59:35 字数 305 浏览 3 评论 0原文

假设我有如下字符串:

OneTwo
ThreeFour
AnotherString
DVDPlayer
CDPlayer

我知道如何标记驼峰式字符串,但“DVDPlayer”和“CDPlayer”除外。我知道我可以手动标记它们,但也许你可以向我展示一个可以处理所有情况的正则表达式?

编辑: 预期的代币是:

OneTwo -> One Two
...
CDPlayer -> CD Player
DVDPlayer -> DVD Player

Suppose I have strings like the following :

OneTwo
ThreeFour
AnotherString
DVDPlayer
CDPlayer

I know how to tokenize the camel-case ones, except the "DVDPlayer" and "CDPlayer". I know I could tokenize them manually, but maybe you can show me a regex that can handle all the cases?

EDIT:
the expected tokens are :

OneTwo -> One Two
...
CDPlayer -> CD Player
DVDPlayer -> DVD Player

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

无妨# 2024-08-11 05:59:35

看看我对这个问题的回答,.NET - 如何将“大写”分隔字符串拆分为数组?

正则表达式如下所示:

/([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g

可以稍微修改它以允许搜索驼峰式标记,方法是将 $ 替换为 \b

/([A-Z]+(?=\b|[A-Z][a-z])|[A-Z]?[a-z]+)/g

Look at my answer on the question, .NET - How can you split a “caps” delimited string into an array?.

The regex looks like this:

/([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g

It can be modified slightly to allow searching for camel-cased tokens, by replacing the $ with \b:

/([A-Z]+(?=\b|[A-Z][a-z])|[A-Z]?[a-z]+)/g
梦初启 2024-08-11 05:59:35

试试这个正则表达式:

[A-Z](?:[a-z]+|[A-Z]*?(?=[A-Z][a-z]|\b))

Try this regular expression:

[A-Z](?:[a-z]+|[A-Z]*?(?=[A-Z][a-z]|\b))
椵侞 2024-08-11 05:59:35

正则表达式

([A-Z]+[a-z]*)([A-Z][a-z]*)

假设所有字符串都是 2 个单词长,并且第二个单词不像 DVD,则

会执行您想要的操作。也就是说,它适用于您的示例,但可能不适用于您实际尝试做的事情。

The regex

([A-Z]+[a-z]*)([A-Z][a-z]*)

would do what you want assuming that all your strings are 2 words long and the second word is not like DVD.

I.e. it would work for your examples but maybe not for what you are actually trying to do.

ま柒月 2024-08-11 05:59:35

这是我的尝试:

([A-Z][a-z]+)|([A-Z]+(?=[A-Z][a-z]+))

Here's my attempt:

([A-Z][a-z]+)|([A-Z]+(?=[A-Z][a-z]+))
长不大的小祸害 2024-08-11 05:59:35

尝试以非贪婪的眼光展望未来。令牌可以是一个或多个大写字符,后跟零个或多个小写字符。当接下来的两个字符是大写和小写时,令牌将终止 - 匹配此部分可以使用非贪婪匹配。这种方法有局限性,但它应该适用于您提供的示例。

Try a non-greedy look ahead. A token would be one or more uppercase characters followed by zero or more lowercase characters. The token would terminate when the next two character are an upper case and lower case - matching this section is what the non-greedy matching can be used. This approach has limitation but it should work for the examples you provided.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文