具有特殊字符的名称的正则表达式 (Unicode)
好吧,我已经读了一整天有关正则表达式的内容,但仍然没有正确理解它。我想做的是验证名称,但我在互联网上可以找到的函数仅使用 [a-zA-Z]
,而忽略了我需要接受的字符。
我基本上需要一个正则表达式来检查名称是否至少有两个单词,并且不包含数字或特殊字符,例如 !"#¤%&/()=...
,但是这些单词可以包含 æ、é、Â 等字符...
可接受的名称示例为:“John Elkjærd”或“André Svenson”
不可接受的名称为: “Hans”、“H4nn3 Andersen”或“Martin Henriksen!”
如果重要的话我会使用 并且希望仅在“负数”服务器端使用 php 的 preg_replace()
(删除不匹配的字符)。
javascript .match()
函数客户端, 非常感谢。
更新:
好的,感谢 Alix Axel 的回答 我得到了重要的部分下来,服务器端一项。
但正如LightWing的答案中的页面所示,我无法找到有关 javascript 支持 unicode 的任何内容,因此我最终为客户端提供了一半的解决方案,只需检查至少两个单词和至少 5 个字符,如下所示:
if(name.match(/\S+/g).length >= minWords && name.length >= 5) {
//valid
}
另一种方法是是按照shifty的答案中的建议指定所有unicode字符,我最终可能会做类似的事情以及上面的解决方案,但这有点不切实际。
Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z]
, leaving characters out that i need to accept to.
I basically need a regex that checks that the name is at least two words, and that it does not contain numbers or special characters like !"#¤%&/()=...
, however the words can contain characters like æ, é, Â and so on...
An example of an accepted name would be: "John Elkjærd" or "André Svenson"
An non-accepted name would be: "Hans", "H4nn3 Andersen" or "Martin Henriksen!"
If it matters i use the javascript .match()
function client side and want to use php's preg_replace()
only "in negative" server side. (removing non-matching characters).
Any help would be much appreciated.
Update:
Okay, thanks to Alix Axel's answer i have the important part down, the server side one.
But as the page from LightWing's answer suggests, i'm unable to find anything about unicode support for javascript, so i ended up with half a solution for the client side, just checking for at least two words and minimum 5 characters like this:
if(name.match(/\S+/g).length >= minWords && name.length >= 5) {
//valid
}
An alternative would be to specify all the unicode characters as suggested in shifty's answer, which i might end up doing something like, along with the solution above, but it is a bit unpractical though.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
尝试以下正则表达式:
在 PHP 中,这翻译为:
你应该这样读:
老实说,我不知道如何将其移植到 Javascript,我什至不确定 Javascript 支持 Unicode 属性,但在 PHP PCRE 中,这个 似乎工作完美@ IDEOne.com:
很抱歉,我无法帮助您了解 Javascript 部分,但可能有人会帮您。
有效:
无效:
要替换无效字符,虽然我不确定为什么需要这个,但只需稍微更改一下即可:
示例:
请注意,您始终需要使用 u 修饰符。
Try the following regular expression:
In PHP this translates to:
You should read it like this:
I honestly don't know how to port this to Javascript, I'm not even sure Javascript supports Unicode properties but in PHP PCRE this seems to work flawlessly @ IDEOne.com:
I'm sorry I can't help you regarding the Javascript part but probably someone here will.
Validates:
Invalidates:
To replace invalid characters, though I'm not sure why you need this, you just need to change it slightly:
Examples:
Note that you always need to use the u modifier.
对于 JavaScript,情况更加棘手,因为 JavaScript Regex 语法不支持 unicode 字符属性。一个实用的解决方案是像这样匹配字母:
这允许所有语言中的字母,并排除数字和键盘上常见的所有特殊(非字母)字符。它是不完美的,因为它还允许非字母的 unicode 特殊符号,例如表情符号、雪人等。然而,由于这些符号通常在键盘上不可用,我认为它们不会被意外输入。因此,根据您的要求,这可能是一个可以接受的解决方案。
Regarding JavaScript it is more tricky, since JavaScript Regex syntax doesn't support unicode character properties. A pragmatic solution would be to match letters like this:
This allows letters in all languages and excludes numbers and all the special (non-letter) characters commonly found on keyboards. It is imperfect because it also allows unicode special symbols which are not letters, e.g. emoticons, snowman and so on. However, since these symbols are typically not available on keyboards I don't think they will be entered by accident. So depending on your requirements it may be an acceptable solution.
访问此页面正则表达式中的 Unicode 字符
visit this page Unicode Characters in Regular Expression
这是对上面 @Alix 的精彩答案的优化。它无需两次定义字符类,并且可以更轻松地定义任意数量的所需单词。
它可以分解如下:
本质上,它是说找到字符类定义的单词,然后找到一个或多个空格或一行的末尾。最后的
{2,}
告诉它必须找到至少两个单词才能匹配成功。这确保了OP的“Hans”示例不会匹配。最后,因为我在寻找 ruby 的类似解决方案时发现了这个问题,这是可以在 Ruby 1.9+ 中使用的正则表达式。
主要变化是使用 \A 和 \Z 作为字符串的开头和结尾(而不是行)以及 Ruby 的 Unicode 字符表示法。
Here's an optimization over the fantastic answer by @Alix above. It removes the need to define the character class twice, and allows for easier definition of any number of required words.
It can be broken down as follows:
Essentially, it is saying to find a word as defined by the character class, then either find one or more spaces or an end of a line. The
{2,}
at the end tells it that a minimum of two words must be found for a match to succeed. This ensures the OP's "Hans" example will not match.Lastly, since I found this question while looking for a similar solution for ruby, here is the regular expression as can be used in Ruby 1.9+
The primary changes are using \A and \Z for beginning and end of string (instead of line) and Ruby's Unicode character notation.
您可以将允许的特殊字符添加到正则表达式中。
示例:
编辑:
不是最好的解决方案,但如果至少有单词,这会给出结果。
you can add the allowed special chars to the regex.
example:
EDIT:
not the best solution, but this would give a result if there are at least to words.
检查输入字符串时,您可以
但是我不确定 \w 简写是否包含重音字符,但它应该属于“单词字符”类别。
When checking your input string you could
However I'm not sure that the \w shorthand includes accented characters, but it should fall into "word characters" category.
这是我用于由最多 3 个单词(1 到 60 个字符)组成的奇特名称的 JS 正则表达式,由空格/单引号/减号分隔
This is the JS regex that I use for fancy names composed with max 3 words (1 to 60 chars), separated by space/single quote/minus sign