验证主机名字符串

发布于 2024-08-27 00:59:34 字数 314 浏览 4 评论 0原文

遵循匹配主机名或IP地址的正则表达式? 并使用对有效主机名的限制作为参考,什么是最可读、最简洁的方式在Python中匹配/验证主机名/fqdn(完全限定域名)?我在下面回答了我的尝试,欢迎改进。

Following up to Regular expression to match hostname or IP Address?
and using Restrictions on valid host names as a reference, what is the most readable, concise way to match/validate a hostname/fqdn (fully qualified domain name) in Python? I've answered with my attempt below, improvements welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

玉环 2024-09-03 00:59:34
import re
def is_valid_hostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname[-1] == ".":
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

确保每个段

  • 至少包含一个字符,最多 63 个字符,
  • 仅包含允许的字符,
  • 不以连字符开头或结尾。

它还避免了双重否定(not disallowed),如果 hostname. 结尾,也没关系。如果 hostname 以多个点结尾,它将(并且应该)失败。

import re
def is_valid_hostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname[-1] == ".":
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

ensures that each segment

  • contains at least one character and a maximum of 63 characters
  • consists only of allowed characters
  • doesn't begin or end with a hyphen.

It also avoids double negatives (not disallowed), and if hostname ends in a ., that's OK, too. It will (and should) fail if hostname ends in more than one dot.

转身泪倾城 2024-09-03 00:59:34

这是 Tim Pietzcker 的答案 的更严格版本,具有以下改进:

  • 将主机名的长度限制为 253 个字符(剥离后)可选的尾随点)。
  • 将字符集限制为 ASCII(即使用 [0-9] 而不是 \d)。
  • 检查 TLD 是否不是全数字。
import re

def is_valid_hostname(hostname):
    if hostname[-1] == ".":
        # strip exactly one dot from the right, if present
        hostname = hostname[:-1]
    if len(hostname) > 253:
        return False

    labels = hostname.split(".")

    # the TLD must be not all-numeric
    if re.match(r"[0-9]+$", labels[-1]):
        return False

    allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(label) for label in labels)

Here's a bit stricter version of Tim Pietzcker's answer with the following improvements:

  • Limit the length of the hostname to 253 characters (after stripping the optional trailing dot).
  • Limit the character set to ASCII (i.e. use [0-9] instead of \d).
  • Check that the TLD is not all-numeric.
import re

def is_valid_hostname(hostname):
    if hostname[-1] == ".":
        # strip exactly one dot from the right, if present
        hostname = hostname[:-1]
    if len(hostname) > 253:
        return False

    labels = hostname.split(".")

    # the TLD must be not all-numeric
    if re.match(r"[0-9]+$", labels[-1]):
        return False

    allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(label) for label in labels)
青柠芒果 2024-09-03 00:59:34

根据老新事物,DNS 名称的最大长度是 253 个字符。 (一个最多允许 255 个八位位组,但其中 2 个八位位组会被编码消耗。)

import re

def validate_fqdn(dn):
    if dn.endswith('.'):
        dn = dn[:-1]
    if len(dn) < 1 or len(dn) > 253:
        return False
    ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?

人们可能会争论是否接受空域名,具体取决于一个人的目的。

, re.IGNORECASE) return all(ldh_re.match(x) for x in dn.split('.'))

人们可能会争论是否接受空域名,具体取决于一个人的目的。

Per The Old New Thing, the maximum length of a DNS name is 253 characters. (One is allowed up to 255 octets, but 2 of those are consumed by the encoding.)

import re

def validate_fqdn(dn):
    if dn.endswith('.'):
        dn = dn[:-1]
    if len(dn) < 1 or len(dn) > 253:
        return False
    ldh_re = re.compile('^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?

One could argue for accepting empty domain names, or not, depending on one's purpose.

, re.IGNORECASE) return all(ldh_re.match(x) for x in dn.split('.'))

One could argue for accepting empty domain names, or not, depending on one's purpose.

客…行舟 2024-09-03 00:59:34

我喜欢 Tim Pietzcker 答案的彻底性,但我更喜欢从正则表达式中卸载一些逻辑以提高可读性。老实说,我必须查找那些 (? “扩展符号”部分的含义。此外,我觉得“双重否定”方法更明显,因为它限制了正则表达式的责任我确实喜欢 re.IGNORECASE 允许缩短

正则表达式,但它读起来有点像散文。我相信到目前为止线程中提到的所有验证约束都已涵盖:


def isValidHostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname.endswith("."): # A single trailing dot is legal
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)
    return all( # Split by labels and verify individually
        (label and len(label) <= 63 # length is within proper range
         and not label.startswith("-") and not label.endswith("-") # no bordering hyphens
         and not disallowed.search(label)) # contains only legal characters
        for label in hostname.split("."))

I like the thoroughness of Tim Pietzcker's answer, but I prefer to offload some of the logic from regular expressions for readability. Honestly, I had to look up the meaning of those (? "extension notation" parts. Additionally, I feel the "double-negative" approach is more obvious in that it limits the responsibility of the regular expression to just finding any invalid character. I do like that re.IGNORECASE allows the regex to be shortened.

So here's another shot; it's longer but it reads kind of like prose. I suppose "readable" is somewhat at odds with "concise". I believe all of the validation constraints mentioned in the thread so far are covered:


def isValidHostname(hostname):
    if len(hostname) > 255:
        return False
    if hostname.endswith("."): # A single trailing dot is legal
        hostname = hostname[:-1] # strip exactly one dot from the right, if present
    disallowed = re.compile("[^A-Z\d-]", re.IGNORECASE)
    return all( # Split by labels and verify individually
        (label and len(label) <= 63 # length is within proper range
         and not label.startswith("-") and not label.endswith("-") # no bordering hyphens
         and not disallowed.search(label)) # contains only legal characters
        for label in hostname.split("."))
我ぃ本無心為│何有愛 2024-09-03 00:59:34
def is_valid_host(host):
    '''IDN compatible domain validator'''
    host = host.encode('idna').lower()
    if not hasattr(is_valid_host, '_re'):
        import re
        is_valid_host._re = re.compile(r'^([0-9a-z][-\w]*[0-9a-z]\.)+[a-z0-9\-]{2,15}
)
    return bool(is_valid_host._re.match(host))
def is_valid_host(host):
    '''IDN compatible domain validator'''
    host = host.encode('idna').lower()
    if not hasattr(is_valid_host, '_re'):
        import re
        is_valid_host._re = re.compile(r'^([0-9a-z][-\w]*[0-9a-z]\.)+[a-z0-9\-]{2,15}
)
    return bool(is_valid_host._re.match(host))
煮酒 2024-09-03 00:59:34

对@TimPietzcker 答案的补充。
下划线是有效的主机名字符(但不适用于域名)。双破折号通常用于 IDN punycode 域(例如 xn--)。端口号应被删除。这是代码的清理。

import re
def is_valid_hostname(hostname):
    if len(hostname) > 255:
        return False
    hostname = hostname.rstrip(".")
    allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

# convert your unicode hostname to punycode (python 3 ) 
# Remove the port number from hostname
normalise_host = hostname.encode("idna").decode().split(":")[0]
is_valid_hostname(normalise_host )

Complimentary to the @TimPietzcker answer.
Underscore is a valid hostname character (but not for domain name) . While double dash is commonly found for IDN punycode domain(e.g. xn--). Port number should be stripped. This is the cleanup of the code.

import re
def is_valid_hostname(hostname):
    if len(hostname) > 255:
        return False
    hostname = hostname.rstrip(".")
    allowed = re.compile("(?!-)[A-Z\d\-\_]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

# convert your unicode hostname to punycode (python 3 ) 
# Remove the port number from hostname
normalise_host = hostname.encode("idna").decode().split(":")[0]
is_valid_hostname(normalise_host )
林空鹿饮溪 2024-09-03 00:59:34

我认为这个正则表达式可能对 Python 有帮助:
'^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'

I think this regex might help in Python:
'^([a-zA-Z0-9]+(\.|\-))*[a-zA-Z0-9]+$'

牵强ㄟ 2024-09-03 00:59:34

如果您希望验证现有主机的名称,最好的方法是尝试解析它。您永远不会编写正则表达式来提供该级别的验证。

If you're looking to validate the name of an existing host, the best way is to try to resolve it. You'll never write a regular expression to provide that level of validation.

万人眼中万个我 2024-09-03 00:59:34

通过排除无效字符并确保非零长度来单独处理每个 DNS 标签。

def isValidHostname(hostname):
    disallowed = re.compile("[^a-zA-Z\d\-]")
    return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))

Process each DNS label individually by excluding invalid characters and ensuring nonzero length.

def isValidHostname(hostname):
    disallowed = re.compile("[^a-zA-Z\d\-]")
    return all(map(lambda x: len(x) and not disallowed.search(x), hostname.split(".")))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文