如何使用正则表达式对其进行标记?
假设我有如下字符串:
OneTwo
ThreeFour
AnotherString
DVDPlayer
CDPlayer
我知道如何标记驼峰式字符串,但“DVDPlayer”和“CDPlayer”除外。我知道我可以手动标记它们,但也许你可以向我展示一个可以处理所有情况的正则表达式?
编辑: 预期的代币是:
OneTwo -> One Two
...
CDPlayer -> CD Player
DVDPlayer -> DVD Player
Suppose I have strings like the following :
OneTwo
ThreeFour
AnotherString
DVDPlayer
CDPlayer
I know how to tokenize the camel-case ones, except the "DVDPlayer" and "CDPlayer". I know I could tokenize them manually, but maybe you can show me a regex that can handle all the cases?
EDIT:
the expected tokens are :
OneTwo -> One Two
...
CDPlayer -> CD Player
DVDPlayer -> DVD Player
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
看看我对这个问题的回答,.NET - 如何将“大写”分隔字符串拆分为数组?。
正则表达式如下所示:
可以稍微修改它以允许搜索驼峰式标记,方法是将
$
替换为\b
:Look at my answer on the question, .NET - How can you split a “caps” delimited string into an array?.
The regex looks like this:
It can be modified slightly to allow searching for camel-cased tokens, by replacing the
$
with\b
:试试这个正则表达式:
Try this regular expression:
正则表达式
假设所有字符串都是 2 个单词长,并且第二个单词不像 DVD,则
会执行您想要的操作。也就是说,它适用于您的示例,但可能不适用于您实际尝试做的事情。
The regex
would do what you want assuming that all your strings are 2 words long and the second word is not like DVD.
I.e. it would work for your examples but maybe not for what you are actually trying to do.
这是我的尝试:
Here's my attempt:
尝试以非贪婪的眼光展望未来。令牌可以是一个或多个大写字符,后跟零个或多个小写字符。当接下来的两个字符是大写和小写时,令牌将终止 - 匹配此部分可以使用非贪婪匹配。这种方法有局限性,但它应该适用于您提供的示例。
Try a non-greedy look ahead. A token would be one or more uppercase characters followed by zero or more lowercase characters. The token would terminate when the next two character are an upper case and lower case - matching this section is what the non-greedy matching can be used. This approach has limitation but it should work for the examples you provided.