检查字符串中的有效域名?
我正在使用 python,想要一个简单的 api 或正则表达式来检查域名的有效性。我所说的有效性指的是语法有效性,而不是域名是否实际存在于互联网上。
I am using python and would like a simple api or regex to check for a domain name's validity. By validity I am the syntactical validity and not whether the domain name actually exists on the Internet or not.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果任何域名是由点分隔的标识符列表(每个标识符不超过 63 个字符)并且由字母、数字和破折号(无下划线)组成,则该域名(在语法上)有效。
所以:
这将是一个开始。当然,现在可能允许一些非 Ascii 字符(这是最近的发展),这会极大地改变参数——您需要处理这个问题吗?
Any domain name is (syntactically) valid if it's a dot-separated list of identifiers, each no longer than 63 characters, and made up of letters, digits and dashes (no underscores).
So:
would be a start. Of course, these days some non-Ascii characters may be allowed (a very recent development) which changes the parameters a lot -- do you need to deal with that?
a.in
),最多为 255 个字符a.in
) and a maximum of 255 characters请注意,虽然您可以使用正则表达式执行某些操作,但测试有效域名的最可靠方法是实际尝试解析名称(使用 socket.getaddrinfo):
请注意,从技术上讲,这可能会让您遭受 DoS 攻击(如果有人提交数千个无效域名,可能需要一段时间才能解析无效名称),但您可以简单地对尝试此操作的人进行速率限制。
这样做的好处是,它会将“hotmail.con”捕获为无效(而不是“hotmail.com”),而正则表达式会说“hotmail.con”有效。
Note that while you can do something with regular expressions, the most reliable way to test for valid domain names is to actually try to resolve the name (with socket.getaddrinfo):
Note that technically this can leave you open to DoS (if someone submits thousands of invalid domain names, it can take a while to resolve invalid names) but you could simply rate-limit someone who tries this.
The advantage of this is that it'll catch "hotmail.con" as invalid (instead of "hotmail.com", say) whereas a regex would say "hotmail.con" is valid.
我一直在使用这个:
以确保它跟在点(www.)或/(http://)之后,并且破折号仅出现在名称内部,并匹配后缀,例如gov.uk。
I've been using this:
to ensure it follows either after dot (www.) or / (http://) and the dash occurs only inside the name and to match suffixes such as gov.uk too.
目前的答案对于规范来说都已经过时了。我相信以下内容将正确匹配当前规范:
The answers are all pretty outdated with the spec at this point. I believe the below will match the current spec correctly: