用于检查字符串中是否连续找到大写字母的正则表达式

发布于 2024-09-29 21:47:52 字数 237 浏览 11 评论 0原文

我想知道以下情况的正则表达式:

字符串应仅包含字母。它必须以大写字母开头,后跟小写字母。那么它可以是小写字母或大写字母。

^[A-Z][a-z][A-Za-z]*$

但该字符串也不得包含任何连续的大写字母。如何将该逻辑添加到正则表达式中?

HttpHandler 是正确的,但 HTTPHandler 是错误的。

I want to know the regexp for the following case:

The string should contain only alphabetic letters. It must start with a capital letter followed by small letter. Then it can be small letters or capital letters.

^[A-Z][a-z][A-Za-z]*$

But the string must also not contain any consecutive capital letters. How do I add that logic to the regexp?

That is, HttpHandler is correct, but HTTPHandler is wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

情丝乱 2024-10-06 21:47:53
^([A-Z][a-z]+)+$

这将查找一个大写字母后跟一个或多个小写字母的序列。连续的大写字母将不匹配,因为一次只允许出现一个,并且后面必须跟一个小写字母。

^([A-Z][a-z]+)+$

This looks for sequences of an uppercase letter followed by one or more lowercase letters. Consecutive uppercase letters will not match, as only one is allowed at a time, and it must be followed by a lowercase one.

乱世争霸 2024-10-06 21:47:53

除了 tchrist 非常出色关于 Unicode 的帖子,我认为您不需要带有负向前瞻的复杂解决方案......
您的定义需要一个大写字母后跟至少一组(一个小写字母可选后跟一个大写字母):

^
[A-Z]    // Start with an uppercase Letter
(        // A Group of:
  [a-z]  // mandatory lowercase letter
  [A-Z]? // an optional Uppercase Letter at the end
         // or in between lowercase letters
)+       // This group at least one time
$

我认为它更紧凑且更易于阅读......

Aside from tchrist's excellent post concerning Unicode, I think you don't need the complex solution with a negative lookahead...
Your definition requires an uppercase-letter followed by at least one group of (a lowercase letter optionally followed by an uppercase-letter):

^
[A-Z]    // Start with an uppercase Letter
(        // A Group of:
  [a-z]  // mandatory lowercase letter
  [A-Z]? // an optional Uppercase Letter at the end
         // or in between lowercase letters
)+       // This group at least one time
$

It is just a bit more compact and easier to read, I think...

止于盛夏 2024-10-06 21:47:53

如果您想获取 MySQL 中至少有一个大写字母的所有员工姓名,请应用以下查询:

SELECT * FROM registration WHERE `name` REGEXP BINARY '[A-Z]';

If you want to get all employee names in MySQL which have at least one uppercase letter then apply this query:

SELECT * FROM registration WHERE `name` REGEXP BINARY '[A-Z]';
再浓的妆也掩不了殇 2024-10-06 21:47:52

每当有人写入 [AZ][az] 时,就明确承诺
仅处理 20 世纪 60 年代的 7 位 ASCII 数据。如果那是
真的可以,那就好吧。但如果不行的话,那就是 Unicode 字符
属性的存在可以帮助您处理现代字符数据。

Unicode 中有三种情况,而不是两种。此外,您还拥有
非大小写字母。一般来说,字母由 \pL 属性指定,
其中每一个也恰好属于五个子类别之一:

  1. 大写字母,用 \p{Lu} 指定;例如:AÇDZÞΣSSὩlIST
  2. 标题大写字母,用 \p{Lt} 指定;例如: LjDzSsᾨSt
    (实际上 SsSt 是一个大写字母,然后是一个小写字母,
    但如果您询问 ß 和的标题,您将得到它们
    ſt,分别)
  3. 小写字母,用\p{Ll}指定;例如:aαçdzςσþßᾡſt
  4. 修饰字母,用\p{Lm}指定;例如:ʰʲᴴᴭʺˈˠᵠꜞ
  5. 其他字母,用\p{Lo}指定;例如:ƻאᎯᚦ京

可以取其中任何一个的补集,但要小心,因为
\P{Lu} 之类的东西 not 表示不是大写的字母!
它表示任何非大写字母的字符。

对于大写字母或标题字母,请使用
[\p{Lu}\p{Lt}]。所以你可以使用你的模式:

 ^([\p{Lu}\p{Lt}]\p{Ll}+)+$

如果你不想限制第一个后面的字母
单独使用“casing”字母,那么您可能更喜欢:

 ^([\p{Lu}\p{Lt}][\p{Ll}\p{Lm}\p{Lo}]+)+$

如果您尝试匹配所谓的“CamelCase”标识符,那么
实际规则取决于编程语言,但通常包括
下划线字符和十进制数字 (\p{Nd}),也可以
包括文字美元符号和其他与语言相关的字符。
如果是这样,您可能希望将其中一些添加到两者中的一个或另一个中
上面提供的字符类。

例如,您可能希望向两者添加下划线,但仅向数字添加
第二个,留​​给你:

 ^([_\p{Lu}\p{Lt}][_\p{Nd}\p{Ll}\p{Lm}\p{Lo}]+)+$

但是,如果你正在处理来自各种 RFC 和 ISO 的某些“单词”
标准中,这些通常被指定为仅包含 ASCII。如果是这样,
你可以通过字面的 [AZ] 想法来实现。这只是不友善
如果该限制实际上不存在,则施加该限制。

Whenever one writes [A-Z] or [a-z], one explicitly commits to
processing nothing but 7-bit ASCII data from the 1960s. If that’s
really ok, then fine. But if it’s not ok, then Unicode character
properties exist to help you with handling modern character data.

There are three cases in Unicode, not two. Furthermore, you also have
noncased letters. Letters in general are specified by the \pL property,
and each of these also belongs to exactly one of five subcategories:

  1. uppercase letters, specified with \p{Lu}; eg: AÇDZÞΣSSὩΙST
  2. titlecase letters, specified with \p{Lt}; eg: LjDzSsᾨSt
    (actually Ss and St are an upper- and then a lowercase letter,
    but they are what you get if you ask for the titlecase of ß and
    ſt, respectively)
  3. lowercase letters, specified with \p{Ll}; eg: aαçdzςσþßᾡſt
  4. modifier letters, specified with \p{Lm}; eg: ʰʲᴴᴭʺˈˠᵠꜞ
  5. other letters, specified with \p{Lo}; eg: ƻאᎯᚦ京

You can take the complement of any of these, but do be careful, because
something like \P{Lu} does not mean a letter that isn’t uppercase!
It means any character that isn’t an uppercase letter.

For a letter that’s either of uppercase or titlecase, use
[\p{Lu}\p{Lt}]. So you could use for your pattern:

 ^([\p{Lu}\p{Lt}]\p{Ll}+)+$

If you don’t mean to limit the letters following the first to the
“casing” letters alone, then you might prefer:

 ^([\p{Lu}\p{Lt}][\p{Ll}\p{Lm}\p{Lo}]+)+$

If you’re trying to match so-called “CamelCase” identifiers, then
the actual rules depend on the programming language, but usually include
the underscore character and the decimal numbers (\p{Nd}), and may also
include a literal dollar sign and other language-dependent characters.
If so, you may wish to add some of these to one or the other of the two
character classes provided above.

For example, you may wish to add underscore to both but digits only to
the second, leaving you with:

 ^([_\p{Lu}\p{Lt}][_\p{Nd}\p{Ll}\p{Lm}\p{Lo}]+)+$

If, though, you are dealing with certain “words” from various RFCs and ISO
standards, these are often specified as containing ASCII only. If so,
you can get by with the literal [A-Z] idea. It’s just not kind to
impose that restriction if it doesn’t actually exist.

这样的小城市 2024-10-06 21:47:52

看看 tchrist 的回答,特别是如果您为网络或更“国际”的东西开发。

奥伦·特鲁特纳的回答< /a> 不太正确(请参阅“RightHerE”的示例输入,它必须匹配,但事实并非如此)。

这是正确的解决方案:

(?!^.*[A-Z]{2,}.*$)^[A-Za-z]*$

解释:

(?!^.*[A-Z]{2,}.*$)  // don't match the whole expression if there are two or more consecutive uppercase letters
^[A-Za-z]*$          // match uppercase and lowercase letters

/edit

解决方案的关键是否定前瞻。请参阅:前向和后向零长度断言

Take a look at tchrist's answer, especially if you develop for the web or something more "international".

Oren Trutner's answer isn't quite right (see sample input of "RightHerE" which must be matched, but isn't).

Here is the correct solution:

(?!^.*[A-Z]{2,}.*$)^[A-Za-z]*$

Explained:

(?!^.*[A-Z]{2,}.*$)  // don't match the whole expression if there are two or more consecutive uppercase letters
^[A-Za-z]*$          // match uppercase and lowercase letters

/edit

The key for the solution is a negative lookahead. See: Lookahead and Lookbehind Zero-Length Assertions

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文