正则表达式:匹配包含数字和字母的字符串,但不匹配仅包含数字的字符串

发布于 2024-07-30 03:50:11 字数 1027 浏览 7 评论 0原文

问题

我希望能够使用单个正则表达式(如果可能)来要求字符串适合 [A-Za-z0-9_] 但不允许:

  • 仅包含数字或/的字符串和符号。
  • 以符号开头或结尾的字符串
  • 彼此相邻的多个符号

Valid

  • test_0123
  • t0e1s2t3
  • 0123_test
  • te0_s1t23
  • t_t

无效

  • t__t
  • ____
  • 01230123
  • _0123
  • < code>_test
  • _test123
  • test_
  • test123_

规则的原因

这样做的目的是过滤我正在工作的网站的用户名在。 我出于特定原因制定了这些规则。

  • 仅包含数字和/或符号的用户名可能会导致路由和数据库查找出现问题。 /users/#{id} 的路由允许 id 为用户的 id 或用户的名称。 因此名称和 ID 不应发生冲突。

  • _test 看起来很奇怪,我不相信它是有效的子域,即 _test.example.com

  • 我不喜欢 t__t 作为子域的外观。 即 t__t.example.com

Question

I would like to be able to use a single regex (if possible) to require that a string fits [A-Za-z0-9_] but doesn't allow:

  • Strings containing just numbers or/and symbols.
  • Strings starting or ending with symbols
  • Multiple symbols next to eachother

Valid

  • test_0123
  • t0e1s2t3
  • 0123_test
  • te0_s1t23
  • t_t

Invalid

  • t__t
  • ____
  • 01230123
  • _0123
  • _test
  • _test123
  • test_
  • test123_

Reasons for the Rules

The purpose of this is to filter usernames for a website I'm working on. I've arrived at the rules for specific reasons.

  • Usernames with only numbers and/or symbols could cause problems with routing and database lookups. The route for /users/#{id} allows id to be either the user's id or user's name. So names and ids shouldn't be able to collide.

  • _test looks wierd and I don't believe it's valid subdomain i.e. _test.example.com

  • I don't like the look of t__t as a subdomain. i.e. t__t.example.com

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

滥情哥ㄟ 2024-08-06 03:50:11

该问题要求一个正则表达式,并暗示它应该是一个匹配的正则表达式,这很好,并由其他人回答。 不过,出于兴趣,我注意到这些规则更容易直接表述为不应匹配的正则表达式。 即:

x !~ /[^A-Za-z0-9_]|^_|_$|__|^\d+$/
  • 除了字母、数字和 _ 之外,不能有其他字符
  • 不能以 _ 开头
  • 不能以 _ 结尾 不能
  • 连续有两个 _
  • 不能全是数字

你不能在 a 中这样使用它Rails validates_format_of,但您可以将其放入类的验证方法中,并且我认为从现在起一个月或一年后,您仍然有更好的机会仍然能够理解您的意思。

The question asks for a single regexp, and implies that it should be a regexp that matches, which is fine, and answered by others. For interest, though, I note that these rules are rather easier to state directly as a regexp that should not match. I.e.:

x !~ /[^A-Za-z0-9_]|^_|_$|__|^\d+$/
  • no other characters than letters, numbers and _
  • can't start with a _
  • can't end with a _
  • can't have two _s in a row
  • can't be all digits

You can't use it this way in a Rails validates_format_of, but you could put it in a validate method for the class, and I think you'd have much better chance of still being able to make sense of what you meant, a month or a year from now.

陌上芳菲 2024-08-06 03:50:11

这与您想要的完全匹配:

/\A(?!_)(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*(?<!_)\z/i
  1. 至少一个字母字符(中间的 [az])。
  2. 不以下划线开头或结尾(开头和结尾处的 (?!_)(?)。
  3. 字母字符前后可以有任意数量的数字、字母或下划线,但每个下划线必须至少由一个数字或字母(其余部分)分隔。

编辑:事实上,由于正则表达式其余部分的工作方式,您可能甚至不需要先行/后行 - 第一个 ?: 括号直到字母数字之后才允许下划线,并且第二个 ?: 括号不允许使用下划线,除非它位于字母数字之前:

/\A(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*\z/i

应该可以正常工作。

This matches exactly what you want:

/\A(?!_)(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*(?<!_)\z/i
  1. At least one alphabetic character (the [a-z] in the middle).
  2. Does not begin or end with an underscore (the (?!_) and (?<!_) at the beginning and end).
  3. May have any number of numbers, letters, or underscores before and after the alphabetic character, but every underscore must be separated by at least one number or letter (the rest).

Edit: In fact, you probably don't even need the lookahead/lookbehinds due to how the rest of the regex works - the first ?: parenthetical won't allow an underscore until after an alphanumeric, and the second ?: parenthetical won't allow an underscore unless it's before an alphanumeric:

/\A(?:[a-z0-9]_?)*[a-z](?:_?[a-z0-9])*\z/i

Should work fine.

作业与我同在 2024-08-06 03:50:11

我确信您可以将所有这些放入一个正则表达式中,但这并不简单,而且我不确定为什么坚持将其作为一个正则表达式。 为什么不在验证期间使用多次传递? 如果验证检查是在用户创建新帐户时完成的,那么确实没有任何理由尝试将其塞入一个正则表达式中。 (也就是说,您一次只会处理一项,而不是数百或数千或更多。我认为,对正常大小的用户名进行几次传递应该花费很少的时间。)

如果名称不符合,则首先拒绝至少包含一个数字; 如果名称不包含至少一个字母,则拒绝; 然后检查start和end是否正确; 这些传递中的每一个都可以是一个易于阅读且易于维护的正则表达式。

I'm sure that you could put all this into one regular expression, but it won't be simple and I'm not sure why insist on it being one regex. Why not use multiple passes during validation? If the validation checks are done when users create a new account, there really isn't any reason to try to cram it into one regex. (That is, you will only be dealing with one item at a time, not hundreds or thousands or more. A few passes over a normal sized username should take very little time, I would think.)

First reject if the name doesn't contain at least one number; then reject if the name doesn't contain at least one letter; then check that the start and end are correct; etc. Each of those passes could be a simple to read and easy to maintain regular expression.

懒猫 2024-08-06 03:50:11

怎么样:

/^(?=[^_])([A-Za-z0-9]+_?)*[A-Za-z](_?[A-Za-z0-9]+)*$/

它不使用反向引用。

编辑:

所有测试用例均成功。 红宝石兼容。

What about:

/^(?=[^_])([A-Za-z0-9]+_?)*[A-Za-z](_?[A-Za-z0-9]+)*$/

It doesn't use a back reference.

Edit:

Succeeds for all your test cases. Is ruby compatible.

凉栀 2024-08-06 03:50:11

这不会阻止“__”,但它确实得到了其余的:

([A-Za-z]|[0-9][0-9_]*)([A-Za-z0-9]|_[A-Za-z0-9])*

这是获取所有规则的较长形式:

([A-Za-z]|([0-9]+(_[0-9]+)*([A-Za-z|_[A-Za-z])))([A-Za-z0-9]|_[A-Za-z0-9])*

天哪,那太丑了。 我同意 Telemachus 的观点,即您可能不应该使用一个正则表达式来执行此操作,即使这在技术上是可能的。 正则表达式对于维护来说通常是一种痛苦。

This doesn't block "__", but it does get the rest:

([A-Za-z]|[0-9][0-9_]*)([A-Za-z0-9]|_[A-Za-z0-9])*

And here's the longer form that gets all your rules:

([A-Za-z]|([0-9]+(_[0-9]+)*([A-Za-z|_[A-Za-z])))([A-Za-z0-9]|_[A-Za-z0-9])*

dang, that's ugly. I'll agree with Telemachus, that you probably shouldn't do this with one regex, even though it's technically possible. regex is often a pain for maintenance.

池予 2024-08-06 03:50:11

在这里:

^(([a-zA-Z]([^a-zA-Z0-9]?[a-zA-Z0-9])*)|([0-9]([^a-zA-Z0-9]?[a-zA-Z0-9])*[a-zA-Z]+([^a-zA-Z0-9]?[a-zA-Z0-9])*))$

如果您想限制要接受的符号,只需将所有 [^a-zA-Z0-9] 更改为包含所有允许符号的 []

Here you go:

^(([a-zA-Z]([^a-zA-Z0-9]?[a-zA-Z0-9])*)|([0-9]([^a-zA-Z0-9]?[a-zA-Z0-9])*[a-zA-Z]+([^a-zA-Z0-9]?[a-zA-Z0-9])*))$

If you want to restrict the symbols you want to accept, simply change all [^a-zA-Z0-9] with [] containing all allowed symbols

从来不烧饼 2024-08-06 03:50:11
(?=.*[a-zA-Z].*)^[A-Za-z0-9](_?[A-Za-z0-9]+)*$

这个有效。

向前看以确保字符串中至少有一个字母,然后开始使用输入。 每次出现下划线时,下一个下划线之前必须有数字或字母。

(?=.*[a-zA-Z].*)^[A-Za-z0-9](_?[A-Za-z0-9]+)*$

This one works.

Look ahead to make sure there's at least one letter in the string, then start consuming input. Every time there is an underscore, there must be a number or a letter before the next underscore.

℉服软 2024-08-06 03:50:11
/^(?![\d_]+$)[A-Za-z0-9]+(?:_[A-Za-z0-9]+)*$/

您的问题本质上与这个问题相同,其中添加了至少一个字符必须是字母的要求。 负向前瞻 - (?![\d_]+$) - 处理该部分,并且比将其合并到基本正则表达式(如其他一些人那样)要容易得多(读和写)尝试去做。

/^(?![\d_]+$)[A-Za-z0-9]+(?:_[A-Za-z0-9]+)*$/

Your question is essentially the same as this one, with the added requirement that at least one of the characters has to be a letter. The negative lookahead - (?![\d_]+$) - takes care of that part, and is much easier (both to read and write) than incorporating it into the basic regex as some others have tried to do.

埋葬我深情 2024-08-06 03:50:11
[A-Za-z][A-Za-z0-9_]*[A-Za-z]

这适用于您的前两条规则(因为第二条规则需要在开头和结尾有一个字母,所以它自动需要字母)。

我不确定第三条规则是否可以使用正则表达式。

[A-Za-z][A-Za-z0-9_]*[A-Za-z]

That would work for your first two rules (since it requires a letter at the beginning and end for the second rule, it automatically requires letters).

I'm not sure the third rule is possible using regexes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文