用于检查字符串中是否连续找到大写字母的正则表达式
我想知道以下情况的正则表达式:
字符串应仅包含字母。它必须以大写字母开头,后跟小写字母。那么它可以是小写字母或大写字母。
^[A-Z][a-z][A-Za-z]*$
但该字符串也不得包含任何连续的大写字母。如何将该逻辑添加到正则表达式中?
即 HttpHandler
是正确的,但 HTTPHandler
是错误的。
I want to know the regexp for the following case:
The string should contain only alphabetic letters. It must start with a capital letter followed by small letter. Then it can be small letters or capital letters.
^[A-Z][a-z][A-Za-z]*$
But the string must also not contain any consecutive capital letters. How do I add that logic to the regexp?
That is, HttpHandler
is correct, but HTTPHandler
is wrong.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这将查找一个大写字母后跟一个或多个小写字母的序列。连续的大写字母将不匹配,因为一次只允许出现一个,并且后面必须跟一个小写字母。
This looks for sequences of an uppercase letter followed by one or more lowercase letters. Consecutive uppercase letters will not match, as only one is allowed at a time, and it must be followed by a lowercase one.
除了 tchrist 非常出色关于 Unicode 的帖子,我认为您不需要带有负向前瞻的复杂解决方案......
您的定义需要一个大写字母后跟至少一组(一个小写字母可选后跟一个大写字母):
我认为它更紧凑且更易于阅读......
Aside from tchrist's excellent post concerning Unicode, I think you don't need the complex solution with a negative lookahead...
Your definition requires an uppercase-letter followed by at least one group of (a lowercase letter optionally followed by an uppercase-letter):
It is just a bit more compact and easier to read, I think...
如果您想获取 MySQL 中至少有一个大写字母的所有员工姓名,请应用以下查询:
If you want to get all employee names in MySQL which have at least one uppercase letter then apply this query:
每当有人写入
[AZ]
或[az]
时,就明确承诺仅处理 20 世纪 60 年代的 7 位 ASCII 数据。如果那是
真的可以,那就好吧。但如果不行的话,那就是 Unicode 字符
属性的存在可以帮助您处理现代字符数据。
Unicode 中有三种情况,而不是两种。此外,您还拥有
非大小写字母。一般来说,字母由
\pL
属性指定,其中每一个也恰好属于五个子类别之一:
\p{Lu}
指定;例如:AÇDZÞΣSSὩlIST\p{Lt}
指定;例如: LjDzSsᾨSt(实际上
Ss
和St
是一个大写字母,然后是一个小写字母,但如果您询问 ß 和的标题,您将得到它们
ſt,分别)
\p{Ll}
指定;例如:aαçdzςσþßᾡſt\p{Lm}
指定;例如:ʰʲᴴᴭʺˈˠᵠꜞ\p{Lo}
指定;例如:ƻאᎯᚦ京您可以取其中任何一个的补集,但要小心,因为
像
\P{Lu}
之类的东西 not 表示不是大写的字母!它表示任何非大写字母的字符。
对于大写字母或标题字母,请使用
[\p{Lu}\p{Lt}]
。所以你可以使用你的模式:如果你不想限制第一个后面的字母
单独使用“casing”字母,那么您可能更喜欢:
如果您尝试匹配所谓的“CamelCase”标识符,那么
实际规则取决于编程语言,但通常包括
下划线字符和十进制数字 (
\p{Nd}
),也可以包括文字美元符号和其他与语言相关的字符。
如果是这样,您可能希望将其中一些添加到两者中的一个或另一个中
上面提供的字符类。
例如,您可能希望向两者添加下划线,但仅向数字添加
第二个,留给你:
但是,如果你正在处理来自各种 RFC 和 ISO 的某些“单词”
标准中,这些通常被指定为仅包含 ASCII。如果是这样,
你可以通过字面的
[AZ]
想法来实现。这只是不友善如果该限制实际上不存在,则施加该限制。
Whenever one writes
[A-Z]
or[a-z]
, one explicitly commits toprocessing nothing but 7-bit ASCII data from the 1960s. If that’s
really ok, then fine. But if it’s not ok, then Unicode character
properties exist to help you with handling modern character data.
There are three cases in Unicode, not two. Furthermore, you also have
noncased letters. Letters in general are specified by the
\pL
property,and each of these also belongs to exactly one of five subcategories:
\p{Lu}
; eg: AÇDZÞΣSSὩΙST\p{Lt}
; eg: LjDzSsᾨSt(actually
Ss
andSt
are an upper- and then a lowercase letter,but they are what you get if you ask for the titlecase of ß and
ſt, respectively)
\p{Ll}
; eg: aαçdzςσþßᾡſt\p{Lm}
; eg: ʰʲᴴᴭʺˈˠᵠꜞ\p{Lo}
; eg: ƻאᎯᚦ京You can take the complement of any of these, but do be careful, because
something like
\P{Lu}
does not mean a letter that isn’t uppercase!It means any character that isn’t an uppercase letter.
For a letter that’s either of uppercase or titlecase, use
[\p{Lu}\p{Lt}]
. So you could use for your pattern:If you don’t mean to limit the letters following the first to the
“casing” letters alone, then you might prefer:
If you’re trying to match so-called “CamelCase” identifiers, then
the actual rules depend on the programming language, but usually include
the underscore character and the decimal numbers (
\p{Nd}
), and may alsoinclude a literal dollar sign and other language-dependent characters.
If so, you may wish to add some of these to one or the other of the two
character classes provided above.
For example, you may wish to add underscore to both but digits only to
the second, leaving you with:
If, though, you are dealing with certain “words” from various RFCs and ISO
standards, these are often specified as containing ASCII only. If so,
you can get by with the literal
[A-Z]
idea. It’s just not kind toimpose that restriction if it doesn’t actually exist.
看看 tchrist 的回答,特别是如果您为网络或更“国际”的东西开发。
奥伦·特鲁特纳的回答< /a> 不太正确(请参阅“RightHerE”的示例输入,它必须匹配,但事实并非如此)。
这是正确的解决方案:
解释:
/edit
解决方案的关键是否定前瞻。请参阅:前向和后向零长度断言
Take a look at tchrist's answer, especially if you develop for the web or something more "international".
Oren Trutner's answer isn't quite right (see sample input of "RightHerE" which must be matched, but isn't).
Here is the correct solution:
Explained:
/edit
The key for the solution is a negative lookahead. See: Lookahead and Lookbehind Zero-Length Assertions