验证主机名字符串
遵循匹配主机名或IP地址的正则表达式? 并使用对有效主机名的限制作为参考,什么是最可读、最简洁的方式在Python中匹配/验证主机名/fqdn(完全限定域名)?我在下面回答了我的尝试,欢迎改进。
Following up to Regular expression to match hostname or IP Address?
and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
确保每个段
它还避免了双重否定(
not disallowed
),如果hostname
以.
结尾,也没关系。如果hostname
以多个点结尾,它将(并且应该)失败。ensures that each segment
It also avoids double negatives (
not disallowed
), and ifhostname
ends in a.
, that's OK, too. It will (and should) fail ifhostname
ends in more than one dot.这是 Tim Pietzcker 的答案 的更严格版本,具有以下改进:
[0-9]
而不是\d
)。Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:
[0-9]
instead of\d
).根据老新事物,DNS 名称的最大长度是 253 个字符。 (一个最多允许 255 个八位位组,但其中 2 个八位位组会被编码消耗。)
人们可能会争论是否接受空域名,具体取决于一个人的目的。
Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)
One could argue for accepting empty domain names, or not, depending on one's purpose.
我喜欢 Tim Pietzcker 答案的彻底性,但我更喜欢从正则表达式中卸载一些逻辑以提高可读性。老实说,我必须查找那些
(?
“扩展符号”部分的含义。此外,我觉得“双重否定”方法更明显,因为它限制了正则表达式的责任我确实喜欢 re.IGNORECASE 允许缩短正则表达式,但它读起来有点像散文。我相信到目前为止线程中提到的所有验证约束都已涵盖:
I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those
(?
"extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:
对@TimPietzcker 答案的补充。
下划线是有效的主机名字符(但不适用于域名)。双破折号通常用于 IDN punycode 域(例如 xn--)。端口号应被删除。这是代码的清理。
Complimentary to the @TimPietzcker answer.
Underscore is a valid hostname character (but not for domain name) . While double dash is commonly found for IDN punycode domain(e.g. xn--). Port number should be stripped. This is the cleanup of the code.
我认为这个正则表达式可能对 Python 有帮助:
'^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'
I think this regex might help in Python:
'^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'
如果您希望验证现有主机的名称,最好的方法是尝试解析它。您永远不会编写正则表达式来提供该级别的验证。
If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.
通过排除无效字符并确保非零长度来单独处理每个 DNS 标签。
Process each DNS label individually by excluding invalid characters and ensuring nonzero length.