检查字符串中的有效域名?

发布于 2024-09-03 08:50:26 字数 81 浏览 7 评论 0原文

我正在使用 python,想要一个简单的 api 或正则表达式来检查域名的有效性。我所说的有效性指的是语法有效性,而不是域名是否实际存在于互联网上。

I am using python and would like a simple api or regex to check for a domain name's validity. By validity I am the syntactical validity and not whether the domain name actually exists on the Internet or not.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

原谅过去的我 2024-09-10 08:50:27

如果任何域名是由点分隔的标识符列表(每个标识符不超过 63 个字符)并且由字母、数字和破折号(无下划线)组成,则该域名(在语法上)有效。

所以:

r'[a-zA-Z\d-]{,63}(\.[a-zA-Z\d-]{,63})*'

这将是一个开始。当然,现在可能允许一些非 Ascii 字符(这是最近的发展),这会极大地改变参数——您需要处理这个问题吗?

Any domain name is (syntactically) valid if it's a dot-separated list of identifiers, each no longer than 63 characters, and made up of letters, digits and dashes (no underscores).

So:

r'[a-zA-Z\d-]{,63}(\.[a-zA-Z\d-]{,63})*'

would be a start. Of course, these days some non-Ascii characters may be allowed (a very recent development) which changes the parameters a lot -- do you need to deal with that?

胡大本事 2024-09-10 08:50:27
r'^(?=.{4,255}$)([a-zA-Z0-9][a-zA-Z0-9-]{,61}[a-zA-Z0-9]\.)+[a-zA-Z0-9]{2,5}
  • Lookahead 确保其最少为 4 (a.in),最多为 255 个字符
  • 一个或多个长度在 1 到 63 之间的标签(用句点分隔),以字母数字字符开头和结尾,并在中间包含字母数字字符和连字符。
  • 后跟顶级域名(博物馆的最大长度为5)
  • Lookahead 确保其最少为 4 (a.in),最多为 255 个字符
  • 一个或多个长度在 1 到 63 之间的标签(用句点分隔),以字母数字字符开头和结尾,并在中间包含字母数字字符和连字符。
  • 后跟顶级域名(博物馆的最大长度为5)
r'^(?=.{4,255}$)([a-zA-Z0-9][a-zA-Z0-9-]{,61}[a-zA-Z0-9]\.)+[a-zA-Z0-9]{2,5}
  • Lookahead makes sure that it has a minimum of 4 (a.in) and a maximum of 255 characters
  • One or more labels (separated by periods) of length between 1 to 63, starting and ending with alphanumeric characters, and containing alphanumeric chars and hyphens in the middle.
  • Followed by a top level domain name (whose max length is 5 for museum)
  • Lookahead makes sure that it has a minimum of 4 (a.in) and a maximum of 255 characters
  • One or more labels (separated by periods) of length between 1 to 63, starting and ending with alphanumeric characters, and containing alphanumeric chars and hyphens in the middle.
  • Followed by a top level domain name (whose max length is 5 for museum)
只有影子陪我不离不弃 2024-09-10 08:50:27

请注意,虽然您可以使用正则表达式执行某些操作,但测试有效域名的最可靠方法是实际尝试解析名称(使用 socket.getaddrinfo):

from socket import getaddrinfo

result = getaddrinfo("www.google.com", None)
print result[0][4]

请注意,从技术上讲,这可能会让您遭受 DoS 攻击(如果有人提交数千个无效域名,可能需要一段时间才能解析无效名称),但您可以简单地对尝试此操作的人进行速率限制。

这样做的好处是,它会将“hotmail.con”捕获为无效(而不是“hotmail.com”),而正则表达式会说“hotmail.con”有效。

Note that while you can do something with regular expressions, the most reliable way to test for valid domain names is to actually try to resolve the name (with socket.getaddrinfo):

from socket import getaddrinfo

result = getaddrinfo("www.google.com", None)
print result[0][4]

Note that technically this can leave you open to DoS (if someone submits thousands of invalid domain names, it can take a while to resolve invalid names) but you could simply rate-limit someone who tries this.

The advantage of this is that it'll catch "hotmail.con" as invalid (instead of "hotmail.com", say) whereas a regex would say "hotmail.con" is valid.

小矜持 2024-09-10 08:50:27

我一直在使用这个:

(r'(\.|\/)(([A-Za-z\d]+|[A-Za-z\d][-])+[A-Za-z\d]+){1,63}\.([A-Za-z]{2,3}\.[A-Za-z]{2}|[A-Za-z]{2,6})')

以确保它跟在点(www.)或/(http://)之后,并且破折号仅出现在名称内部,并匹配后缀,例如gov.uk。

I've been using this:

(r'(\.|\/)(([A-Za-z\d]+|[A-Za-z\d][-])+[A-Za-z\d]+){1,63}\.([A-Za-z]{2,3}\.[A-Za-z]{2}|[A-Za-z]{2,6})')

to ensure it follows either after dot (www.) or / (http://) and the dash occurs only inside the name and to match suffixes such as gov.uk too.

仅一夜美梦 2024-09-10 08:50:27

目前的答案对于规范来说都已经过时了。我相信以下内容将正确匹配当前规范:

r'^(?=.{1,253}$)(?!.*\.\..*)(?!\..*)([a-zA-Z0-9-]{,63}\.){,127}[a-zA-Z0-9-]{1,63}

The answers are all pretty outdated with the spec at this point. I believe the below will match the current spec correctly:

r'^(?=.{1,253}$)(?!.*\.\..*)(?!\..*)([a-zA-Z0-9-]{,63}\.){,127}[a-zA-Z0-9-]{1,63}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文