测试字符串是否仅包含字母(az + é ü ö ê å ø 等..)
我想匹配一个字符串以确保它只包含字母。
我已经得到了这个,它工作得很好:
var onlyLetters = /^[a-zA-Z]*$/.test(myString);
但是
因为我也说另一种语言,所以我需要允许所有字母,而不仅仅是 AZ。又例如:
é ü ö ê å ø
有谁知道是否存在一个全局 'alpha'
术语,其中包含与 regExp 一起使用的所有字母?或者更好的是,有人有某种解决方案吗?
非常感谢
编辑: 刚刚意识到您可能还想允许在双重名称中使用“-”和“”,例如:“Mary-Ann”或“Mary Ann”
I want to match a string to make sure it contains only letters.
I've got this and it works just fine:
var onlyLetters = /^[a-zA-Z]*$/.test(myString);
BUT
Since I speak another language too, I need to allow all letters, not just A-Z. Also for example:
é ü ö ê å ø
does anyone know if there is a global 'alpha'
term that includes all letters to use with regExp? Or even better, does anyone have some kind of solution?
Thanks alot
EDIT:
Just realized that you might also wanna allow '-' and ' ' incase of a double name like: 'Mary-Ann' or 'Mary Ann'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
我不知道这样做的实际原因,但如果您想将其用作预检查,例如登录名或用户昵称,我建议您自己输入字符并且不要使用整个字符您将在 unicode 中找到“alpha”字符,因为您可能不会在以下字母中发现光学差异:
在这种情况下,如果您想最大程度地减少帐户伪造等情况,最好手动指定允许的字母。
加法
好吧,如果它的字段应该是非唯一的,我也会允许希腊语。当我强迫用户将他们的名字更改为拉丁化版本时,我会感觉不舒服。
但对于像昵称这样的独特字段,您需要给网站的其他访问者一个提示,这确实是他们认为的昵称。糟糕的是人们已经通过交换 I 和 l 来伪造账户了。当然,这取决于你的用户;但可以肯定的是,我认为最好只允许基本的拉丁语+变音符号。 (也许看看这个列表:Latin-衍生_alphabet)
作为未经测试的建议(与'-'、'_' 和 ' '):
另一个编辑:
我为 O'Neill 或 O'Reilly 等名字的人添加了撇号。 (对于无法正确输入卷曲撇号的人,还可以使用直撇号和反撇号。)
I don’t know the actual reason for doing this, but if you want to use it as a pre-check for, say, login names oder user nicknames, I’d suggest you enter the characters yourself and don’t use the whole ‘alpha’ characters you’ll find in unicode, because you probably won’t find an optical difference in the following letters:
In such cases it’s better to specify the allowed letters manually if you want to minimise account faking and such.
Addition
Well, if it’s for a field which is supposed to be non-unique, I would allow greek as well. I wouldn’t feel well when I force users into changing their name to a latinised version.
But for unique fields like nicknames you need to give your other visitors of the site a hint, that it’s really the nickname they think it is. Bad enough that people will fake accounts with interchanging I and l already. Of course, it’s something that depends on your users; but to be sure I think it’s better to allow basic latin + diacritics only. (Maybe have a look at this list: Latin-derived_alphabet)
As an untested suggestion (with ‘-’, ‘_’ and ‘ ’):
Another edit:
I have added the apostrophe for people with names like O’Neill or O’Reilly. (And the straight and the reversed apostrophe for people who can’t enter the curly one correctly.)
在 JS 中你不能这样做。它的正则表达式和规范化器支持非常有限。您需要构建一个冗长且难以维护的字符数组,其中包含所有可能的带有变音标记的拉丁字符(我猜大约有 500 个不同的字符)。相反,将验证任务委托给服务器端,服务器端使用另一种具有更多正则表达式功能的语言,如有必要,还可以借助 ajax。
在成熟的正则表达式环境中,您可以只测试字符串是否匹配
\p{L}+
。这是一个 Java 示例:或者,您还可以规范文本以去掉变音符号,并检查它是否仅包含
[A-Za-z]+
。这里又是一个 Java 示例:PHP 支持类似的功能。
You can't do this in JS. It has a very limited regex and normalizer support. You would need to construct a lengthy and unmaintainable character array with all possible latin characters with diacritical marks (I guess there are around 500 different ones). Rather delegate the validation task to the server side which uses another language with more regex capabilties, if necessary with help of ajax.
In a full fledged regex environment you could just test if the string matches
\p{L}+
. Here's a Java example:Alternatively, you could also normailze the text to get rid of the diacritical marks and check if it contains
[A-Za-z]+
only. Here's again a Java example:PHP supports similar functions.
当我尝试实现 @Debilski 的解决方案时,JavaScript 不喜欢扩展的拉丁字符——我必须将它们编码为 JavaScript 转义符:
When I tried to implement @Debilski's solution JavaScript didn't like the extended Latin characters -- I had to code them as JavaScript escapes:
应该有,但正则表达式将依赖于本地化。因此,例如,如果您使用的是美国本地化版本,则
é ü ö ê å ø
将不会被过滤。为了确保您的网站在所有本地化版本中都能满足您的要求,您应该以与您已经执行的操作类似的形式明确写出字符。我知道的唯一标准是
\w
,它将匹配所有字母数字字符。您可以通过运行两个正则表达式以“标准”方式来完成此操作,一个用于验证\w
匹配,另一个用于验证\d
(所有数字)不匹配,这将产生一个有保证的仅包含字母的字符串。再次,我强烈建议您不要使用此技术,因为无法保证\w
在给定的本地化中代表什么,但这确实回答了您的问题。There should be, but the regex will be localization dependent. Thus,
é ü ö ê å ø
won't be filtered if you're on a US localization, for example. To ensure your web site does what you want across all localizations, you should explicitly write out the characters in a form similar to what you are already doing.The only standard one I am aware of though is
\w
, which would match all alphanumeric characters. You could do it the "standard" way by running two regex, one to verify\w
matches and another to verify that\d
(all digits) does not match, which would result in a guaranteed alpha-only string. Again, I'd strongly urge you not to use this technique as there's no guarantee what\w
will represent in a given localization, but this does answer your question.这可能很棘手,不幸的是 JavaScript 对国际化的支持非常差。要执行此检查,您必须创建自己的角色类。这是因为,例如,
\w
与[0-9A-Z_a-z]
相同,这对您没有多大帮助,并且没有类似 < JavaScript 中的代码>[[:alpha:]]。但是,由于听起来您只会使用另一种语言,因此您可能只需将其他字符添加到您的字符类中即可。顺便说一句,我认为如果 myString 可以长于一个字符,您的正则表达式中将需要一个
?
或*
。完整示例
/^[a-zA-Zéüöêåø]*$/.test(myString);
This can be tricky, unfortunately JavaScript has pretty poor support for internationalization. To do this check you'll have to create your own character class. This is because for instance,
\w
is the same as[0-9A-Z_a-z]
which won't help you much and there isn't anything like[[:alpha:]]
in Javascript. But since it sounds like you're only going to use one other langauge you can probably just add those other characters into your character class.By the way, I think you'll need a
?
or*
in your regexp there if myString can be longer than one character.The full example,
/^[a-zA-Zéüöêåø]*$/.test(myString);
我对 Javascript 一无所知,但如果它有适当的 unicode 支持,请将字符串转换为分解形式,然后从中删除变音符号 (
[\u0300-\u036f\u1dc0-\u1dff]
)。那么你的字母将只是 ASCII 字母。I don't know anything about Javascript, but if it has proper unicode support, convert your string to a decomposed form, then remove the diacritics from it (
[\u0300-\u036f\u1dc0-\u1dff]
). Then your letters will only be ASCII ones.您可以使用黑名单而不是白名单。这样你就可以只删除不需要的字符。
You could aways use a blacklist instead of a whitelist. That way you only remove the characters you do not need.
您可以使用黑名单 - 要排除的字符列表。
此外,在服务器端验证输入也很重要,而不仅仅是在客户端!客户端很容易被绕过。
You could use a blacklist - a list of characters to exclude.
Also, it is important to verify the input on server-side, not only on client-side! Client-side can be bypassed easily.
在其他正则表达式方言中,有一些快捷方式可以实现此目的 - 请参阅此页面。但我不相信 JavaScript 中有任何标准化的——当然不是所有浏览器都支持的。
There are some shortcuts to achive this in other regular expression dialects - see this page. But I don't believe there are any standardised ones in JavaScript - certainly not that would be supported by all browsers.
我在检查之前使用了转换器,但它仍然对所有语言都不友好。
我不确定这是否可能。
I'm using a convertor before checking, but it's still not friendly for all languages.
I'm not sure that's possible.