Ruby 中有效子域的正则表达式

发布于 2024-10-20 11:56:20 字数 656 浏览 2 评论 0 原文

我正在尝试验证将用作子域的用户输入字符串。规则如下:

  1. 长度在 1 到 63 个字符之间(我从 Google Chrome 允许在子域中的字符数中取 63,不确定它是否实际上是服务器指令。如果您对有效最大长度有更好的建议,我有兴趣听到它)
  2. 可能包含 a-zA-Z0-9、连字符、下划线
  3. 可能不以连字符或下划线开头或结尾

编辑:从下面的输入中,我添加了以下内容: 4. 不应包含连续的连字符或下划线。

示例:

a => valid
0 => valid
- => not valid
_ => not valid
a- => not valid
-a => not valid
a_ => not valid
_a => not valid
aa => valid
aaa => valid
a-a-a => valid
0-a => valid
a&a => not valid
a-_0 => not valid
a--a => not valid
aaa- => not valid

我的问题是我不确定如何使用正则表达式指定字符串只能是一个字符,同时还指定它不能以连字符或下划线开头或结尾。

谢谢!

I'm attempting to validate a string of user input that will be used as a subdomain. The rules are as follows:

  1. Between 1 and 63 characters in length (I take 63 from the number of characters Google Chrome appears to allow in a subdomain, not sure if it's actually a server directive. If you have better advice on valid max length, I'm interested in hearing it)
  2. May contain a-zA-Z0-9, hyphen, underscore
  3. May not begin or end with a hyphen or underscore

EDIT: From input below, I've added the following:
4. Should not contain consecutive hyphens or underscores.

Examples:

a => valid
0 => valid
- => not valid
_ => not valid
a- => not valid
-a => not valid
a_ => not valid
_a => not valid
aa => valid
aaa => valid
a-a-a => valid
0-a => valid
a&a => not valid
a-_0 => not valid
a--a => not valid
aaa- => not valid

My issue is I'm not sure how to specify with a RegEx that the string is allowed to be only one character, while also specifying that it may not begin or end with a hyphen or underscore.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

┈┾☆殇 2024-10-27 11:56:20

不能可以在适当的子域中包含下划线,但是您需要它们吗? 修剪输入后,进行简单的字符串长度检查,然后使用以下内容进行测试:

/^[a-z\d]+(-[a-z\d]+)*$/i

使用上述内容,您将不会获得连续的 - 字符,例如 a -bbb-ccc 通过,a--d 失败。

/^[a-z\d]+([-_][a-z\d]+)*$/i

也允许非连续的下划线。


更新:您会发现,在实践中,不允许使用下划线,并且所有子域都必须以字母开头。上述解决方案不允许国际化子域(punycode)。你最好用这个

/\A([a-z][a-z\d]*(-[a-z\d]+)*|xn--[\-a-z\d]+)\z/i

You can't can have underscores in proper subdomains, but do you need them? After trimming your input, do a simple string length check, then test with this:

/^[a-z\d]+(-[a-z\d]+)*$/i

With the above, you won't get consecutive - characters, e.g. a-bbb-ccc passes and a--d fails.

/^[a-z\d]+([-_][a-z\d]+)*$/i

Will allow non-consecutive underscores as well.


Update: you'll find that, in practice, underscores are disallowed and all subdomains must start with a letter. The solution above does not allow internationalised subdomains (punycode). You're better of using this

/\A([a-z][a-z\d]*(-[a-z\d]+)*|xn--[\-a-z\d]+)\z/i
风铃鹿 2024-10-27 11:56:20

我不熟悉 Ruby 正则表达式语法,但我假设它就像 Perl。听起来像你想要的:

/^(?![-_])[-a-z\d_]{1,63}(?<![-_])$/i

或者如果 Ruby 不使用 i 标志,只需将 [-az\d_] 替换为 [-a-zA-Z\ d_]

我使用 [-a-zA-Z\d_] 而不是较短的 [-\w] 的原因是,虽然几乎等效,但 \ w 将允许特殊字符,例如 ä 而不仅仅是 ASCII 类型字符。在大多数语言中,可以选择关闭该行为,或者如果您愿意,也可以允许它。

有关字符类量词lookarounds

I'm not familiar with Ruby regex syntax, but I'll assume it's like, say, Perl. Sounds like you want:

/^(?![-_])[-a-z\d_]{1,63}(?<![-_])$/i

Or if Ruby doesn't use the i flag, just replace [-a-z\d_] with [-a-zA-Z\d_].

The reason I'm using [-a-zA-Z\d_] instead of the shorter [-\w] is that, while nearly equivalent, \w will allow special characters such as ä rather than just ASCII-type characters. That behavior can be optionally turned off in most languages, or you can allow it if you like.

Some more information on character classes, quantifiers, and lookarounds

茶花眉 2024-10-27 11:56:20
/^([a-z0-9][a-z0-9\-\_]{0,61}[a-z0-9]|[a-z0-9])$/i

我将创建一个正则表达式视为一项挑战,该正则表达式应仅匹配具有不重复连字符或下划线的字符串,并为您检查正确的长度:

/^([a-z0-9]([_\-](?![_\-])|[a-z0-9]){0,61}[a-z0-9]|[a-z0-9])$/i

中间部分使用环视来验证这一点。

/^([a-z0-9][a-z0-9\-\_]{0,61}[a-z0-9]|[a-z0-9])$/i

I've took it as a challenge to create a regex that should match only strings with non-repeating hyphens or underscores and also check the proper length for you:

/^([a-z0-9]([_\-](?![_\-])|[a-z0-9]){0,61}[a-z0-9]|[a-z0-9])$/i

The middle part uses a lookaround to verify that.

不忘初心 2024-10-27 11:56:20

^[a-zA-Z]([-a-zA-Z\d]*[a-zA-Z\d])?$

这只是以有效的方式强制执行标准,而无需回溯。它不检查长度,但正则表达式在此类事情上效率低下。只需检查字符串长度(1 到 64 个字符)。

^[a-zA-Z]([-a-zA-Z\d]*[a-zA-Z\d])?$

This simply enforces the standard in an efficient way without backtracking. It does not check the length, but Regex is inefficient at things like that. Just check the string length (1 to 64 chars).

此岸叶落 2024-10-27 11:56:20

/[^\W\_](.+?)[^\W\_]$/i 应该适合你(试试我们的 http://rubular.com/ 来测试正则表达式)

编辑:实际上,这不会检查单/双字母/数字。尝试 /([^\W\_](.+?)[^\W\_])|([a-z0-9]{1,2})/i 代替,并且在 rubular 中修改它,直到你得到你想要的(如果这还没有解决它)。

/[^\W\_](.+?)[^\W\_]$/i should work for ya (try our http://rubular.com/ to test out regular expressions)

EDIT: actually, this doesn't check single/double letter/numbers. try /([^\W\_](.+?)[^\W\_])|([a-z0-9]{1,2})/i instead, and tinker with it in rubular until you get exactly what ya want (if this doesn't take care of it already).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文