验证主机名字符串
遵循匹配主机名或IP地址的正则表达式? 并使用对有效主机名的限制作为参考,什么是最可读、最简洁的方式在Python中匹配/验证主机名/fqdn(完全限定域名)?我在下面回答了我的尝试,欢迎改进。
Following up to Regular expression to match hostname or IP Address?
and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
确保每个段
它还避免了双重否定(
not disallowed
),如果hostname
以.
结尾,也没关系。如果hostname
以多个点结尾,它将(并且应该)失败。ensures that each segment
It also avoids double negatives (
not disallowed
), and ifhostname
ends in a.
, that's OK, too. It will (and should) fail ifhostname
ends in more than one dot.这是 Tim Pietzcker 的答案 的更严格版本,具有以下改进:
[0-9]
而不是\d
)。Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:
[0-9]
instead of\d
).根据老新事物,DNS 名称的最大长度是 253 个字符。 (一个最多允许 255 个八位位组,但其中 2 个八位位组会被编码消耗。)
人们可能会争论是否接受空域名,具体取决于一个人的目的。
Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)
One could argue for accepting empty domain names, or not, depending on one's purpose.
我喜欢 Tim Pietzcker 答案的彻底性,但我更喜欢从正则表达式中卸载一些逻辑以提高可读性。老实说,我必须查找那些
(?
“扩展符号”部分的含义。此外,我觉得“双重否定”方法更明显,因为它限制了正则表达式的责任我确实喜欢 re.IGNORECASE 允许缩短正则表达式,但它读起来有点像散文。我相信到目前为止线程中提到的所有验证约束都已涵盖:
I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those
(?
"extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:
对@TimPietzcker 答案的补充。
下划线是有效的主机名字符(但不适用于域名)。双破折号通常用于 IDN punycode 域(例如 xn--)。端口号应被删除。这是代码的清理。
Complimentary to the @TimPietzcker answer.
Underscore is a valid hostname character (but not for domain name) . While double dash is commonly found for IDN punycode domain(e.g. xn--). Port number should be stripped. This is the cleanup of the code.
我认为这个正则表达式可能对 Python 有帮助:
'^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'
I think this regex might help in Python:
'^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'
如果您希望验证现有主机的名称,最好的方法是尝试解析它。您永远不会编写正则表达式来提供该级别的验证。
If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.
通过排除无效字符并确保非零长度来单独处理每个 DNS 标签。
Process each DNS label individually by excluding invalid characters and ensuring nonzero length.