正则表达式:匹配包含数字和字母的字符串,但不匹配仅包含数字的字符串
问题
我希望能够使用单个正则表达式(如果可能)来要求字符串适合 [A-Za-z0-9_]
但不允许:
- 仅包含数字或/的字符串和符号。
- 以符号开头或结尾的字符串
- 彼此相邻的多个符号
Valid
test_0123
t0e1s2t3
0123_test
te0_s1t23
t_t
无效
t__t
____
01230123
_0123
- < code>_test
_test123
test_
test123_
规则的原因
这样做的目的是过滤我正在工作的网站的用户名在。 我出于特定原因制定了这些规则。
仅包含数字和/或符号的用户名可能会导致路由和数据库查找出现问题。
/users/#{id}
的路由允许id
为用户的 id 或用户的名称。 因此名称和 ID 不应发生冲突。_test
看起来很奇怪,我不相信它是有效的子域,即_test.example.com
我不喜欢
t__t
作为子域的外观。 即t__t.example.com
Question
I would like to be able to use a single regex (if possible) to require that a string fits [A-Za-z0-9_]
but doesn't allow:
- Strings containing just numbers or/and symbols.
- Strings starting or ending with symbols
- Multiple symbols next to eachother
Valid
test_0123
t0e1s2t3
0123_test
te0_s1t23
t_t
Invalid
t__t
____
01230123
_0123
_test
_test123
test_
test123_
Reasons for the Rules
The purpose of this is to filter usernames for a website I'm working on. I've arrived at the rules for specific reasons.
Usernames with only numbers and/or symbols could cause problems with routing and database lookups. The route for
/users/#{id}
allowsid
to be either the user's id or user's name. So names and ids shouldn't be able to collide._test
looks wierd and I don't believe it's valid subdomain i.e._test.example.com
I don't like the look of
t__t
as a subdomain. i.e.t__t.example.com
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
该问题要求一个正则表达式,并暗示它应该是一个匹配的正则表达式,这很好,并由其他人回答。 不过,出于兴趣,我注意到这些规则更容易直接表述为不应匹配的正则表达式。 即:
你不能在 a 中这样使用它Rails validates_format_of,但您可以将其放入类的验证方法中,并且我认为从现在起一个月或一年后,您仍然有更好的机会仍然能够理解您的意思。
The question asks for a single regexp, and implies that it should be a regexp that matches, which is fine, and answered by others. For interest, though, I note that these rules are rather easier to state directly as a regexp that should not match. I.e.:
You can't use it this way in a Rails validates_format_of, but you could put it in a validate method for the class, and I think you'd have much better chance of still being able to make sense of what you meant, a month or a year from now.
这与您想要的完全匹配:
[az]
)。(?!_)
和(?)。
编辑:事实上,由于正则表达式其余部分的工作方式,您可能甚至不需要先行/后行 - 第一个
?:
括号直到字母数字之后才允许下划线,并且第二个?:
括号不允许使用下划线,除非它位于字母数字之前:应该可以正常工作。
This matches exactly what you want:
[a-z]
in the middle).(?!_)
and(?<!_)
at the beginning and end).Edit: In fact, you probably don't even need the lookahead/lookbehinds due to how the rest of the regex works - the first
?:
parenthetical won't allow an underscore until after an alphanumeric, and the second?:
parenthetical won't allow an underscore unless it's before an alphanumeric:Should work fine.
我确信您可以将所有这些放入一个正则表达式中,但这并不简单,而且我不确定为什么坚持将其作为一个正则表达式。 为什么不在验证期间使用多次传递? 如果验证检查是在用户创建新帐户时完成的,那么确实没有任何理由尝试将其塞入一个正则表达式中。 (也就是说,您一次只会处理一项,而不是数百或数千或更多。我认为,对正常大小的用户名进行几次传递应该花费很少的时间。)
如果名称不符合,则首先拒绝至少包含一个数字; 如果名称不包含至少一个字母,则拒绝; 然后检查start和end是否正确; 这些传递中的每一个都可以是一个易于阅读且易于维护的正则表达式。
I'm sure that you could put all this into one regular expression, but it won't be simple and I'm not sure why insist on it being one regex. Why not use multiple passes during validation? If the validation checks are done when users create a new account, there really isn't any reason to try to cram it into one regex. (That is, you will only be dealing with one item at a time, not hundreds or thousands or more. A few passes over a normal sized username should take very little time, I would think.)
First reject if the name doesn't contain at least one number; then reject if the name doesn't contain at least one letter; then check that the start and end are correct; etc. Each of those passes could be a simple to read and easy to maintain regular expression.
怎么样:
它不使用反向引用。
编辑:
所有测试用例均成功。 红宝石兼容。
What about:
It doesn't use a back reference.
Edit:
Succeeds for all your test cases. Is ruby compatible.
这不会阻止“__”,但它确实得到了其余的:
这是获取所有规则的较长形式:
天哪,那太丑了。 我同意 Telemachus 的观点,即您可能不应该使用一个正则表达式来执行此操作,即使这在技术上是可能的。 正则表达式对于维护来说通常是一种痛苦。
This doesn't block "__", but it does get the rest:
And here's the longer form that gets all your rules:
dang, that's ugly. I'll agree with Telemachus, that you probably shouldn't do this with one regex, even though it's technically possible. regex is often a pain for maintenance.
在这里:
如果您想限制要接受的符号,只需将所有 [^a-zA-Z0-9] 更改为包含所有允许符号的 []
Here you go:
If you want to restrict the symbols you want to accept, simply change all [^a-zA-Z0-9] with [] containing all allowed symbols
这个有效。
向前看以确保字符串中至少有一个字母,然后开始使用输入。 每次出现下划线时,下一个下划线之前必须有数字或字母。
This one works.
Look ahead to make sure there's at least one letter in the string, then start consuming input. Every time there is an underscore, there must be a number or a letter before the next underscore.
您的问题本质上与这个问题相同,其中添加了至少一个字符必须是字母的要求。 负向前瞻 -
(?![\d_]+$)
- 处理该部分,并且比将其合并到基本正则表达式(如其他一些人那样)要容易得多(读和写)尝试去做。Your question is essentially the same as this one, with the added requirement that at least one of the characters has to be a letter. The negative lookahead -
(?![\d_]+$)
- takes care of that part, and is much easier (both to read and write) than incorporating it into the basic regex as some others have tried to do.这适用于您的前两条规则(因为第二条规则需要在开头和结尾有一个字母,所以它自动需要字母)。
我不确定第三条规则是否可以使用正则表达式。
That would work for your first two rules (since it requires a letter at the beginning and end for the second rule, it automatically requires letters).
I'm not sure the third rule is possible using regexes.