当前位置：文江博客话题详情

PHP DNS

去 PHP parse_url() 不去的地方 - 仅解析域名

发布于 2024-07-10 18:17:21 字数 509 浏览 5 评论 0 原文

PHP 的 parse_url() 有一个主机字段，其中包括完整的主机。我正在寻找最可靠（且成本最低）的方法来仅返回域名和 TLD。

给出示例：

http://www.google.com/foo，parse_url() 返回 www.google .com 表示主机
http://www.google.co.uk/foo，parse_url()为主机返回 www.google.co.uk

我只查找 google.com 或 google.co.uk。我考虑了一张有效顶级域名/后缀表，并且只允许使用这些和一个单词。你会用其他方式做吗？有谁知道针对此类事情的预装有效正则表达式？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

提笔落墨 2024-07-17 18:17:21

类似的事情怎么样？

function getDomain($url) {
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}

将使用经典的 parse_url 提取域名，然后查找没有任何子域的有效域（www 是子域）。不适用于“localhost”之类的东西。如果不匹配任何内容，将返回 false。

//编辑：

尝试一下：

echo getDomain('http://www.google.com/test.html') . '<br/>';
echo getDomain('https://news.google.co.uk/?id=12345') . '<br/>';
echo getDomain('http://my.subdomain.google.com/directory1/page.php?id=abc') . '<br/>';
echo getDomain('https://testing.multiple.subdomain.google.co.uk/') . '<br/>';
echo getDomain('http://nothingelsethan.com') . '<br/>';

它应该返回：

google.com
google.co.uk
google.com
google.co.uk
nothingelsethan.com

当然，如果没有通过parse_url，因此请确保它是格式正确的 URL。

// 附录：

Alnitak 是对的。上面提出的解决方案适用于大多数情况，但不一定适用于所有情况，并且需要进行维护，以确保它们不是带有 .morethan6 个字符等的新 TLD。提取域名的唯一可靠方法是使用维护的列表，例如 http://publicsuffix.org/。一开始会比较痛苦，但从长远来看会更容易、更稳健。您需要确保了解每种方法的优缺点以及它如何适合您的项目。

How about something like that?

function getDomain($url) {
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}

Will extract the domain name using the classic parse_url and then look for a valid domain without any subdomain (www being a subdomain). Won't work on things like 'localhost'. Will return false if it didn't match anything.

// Edit:

Try it out with:

echo getDomain('http://www.google.com/test.html') . '<br/>';
echo getDomain('https://news.google.co.uk/?id=12345') . '<br/>';
echo getDomain('http://my.subdomain.google.com/directory1/page.php?id=abc') . '<br/>';
echo getDomain('https://testing.multiple.subdomain.google.co.uk/') . '<br/>';
echo getDomain('http://nothingelsethan.com') . '<br/>';

And it should return:

google.com
google.co.uk
google.com
google.co.uk
nothingelsethan.com

Of course, it won't return anything if it doesn't get through parse_url, so make sure it's a well-formed URL.

// Addendum:

Alnitak is right. The solution presented above will work in most cases but not necessarily all and needs to be maintained to make sure, for example, that their aren't new TLD with .morethan6characters and so on. The only reliable way of extracting the domain is to use a maintained list such as http://publicsuffix.org/. It's more painful at first but easier and more robust on the long-term. You need to make sure you understand the pros and cons of each method and how it fits with your project.

回复收藏 0 原文

记忆消瘦 2024-07-17 18:17:21

目前，唯一“正确”的方法是使用一个列表，例如 http://publicsuffix.org/< 中维护的列表/a>

顺便说一句，这个问题也几乎是重复的：

IETF 正在开展标准化工作，着眼于声明 DNS 树中的特定节点是否用于“公共”注册的 DNS 方法，但它们仍处于开发的早期阶段。所有流行的非 IE 浏览器都使用 publicsuffix.org 列表。

回复收藏 0 原文

做个ˇ局外人 2024-07-17 18:17:21

Python 的 tldextract 模块还有一个非常好的移植 http://w-shadow。 com/blog/2012/08/28/tldextract - 这超出了 parse_url 的范围，并允许您实际获取域/tld，而不需要子域。

来自模块网站：

$components = tldextract('http://www.bbc.co.uk');
echo $components->subdomain; // www
echo $components->domain;    // bbc
echo $components->tld;       // co.uk

There is also a very nice port of Python's tldextract module http://w-shadow.com/blog/2012/08/28/tldextract - this goes beyond parse_url and allows you to actually get the domain/tld out, without the subdomain.

From the module website:

$components = tldextract('http://www.bbc.co.uk');
echo $components->subdomain; // www
echo $components->domain;    // bbc
echo $components->tld;       // co.uk

回复收藏 0 原文

云淡风轻 2024-07-17 18:17:21

从相关帖子中挖出此内容，以了解保留表格的想法： http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/ effective_tld_names.dat?raw=1

但我宁愿不这样做。

回复收藏 0 原文

才能让你更想念 2024-07-17 18:17:21

您需要使用公共后缀列表的软件包，只有这样您才能正确提取具有两级、三级TLD的域名（co.uk、a.bg、b.bg 等）和多级子域。正则表达式、parse_url() 或字符串函数永远不会产生绝对正确的结果。

我建议使用 TLD 提取。这里是代码示例：

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('http://www.google.co.uk/foo');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'google'
$result->getSuffix(); // will return (string) 'co.uk'
$result->getRegistrableDomain(); // will return (string) 'google.co.uk'

You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.) and multilevel subdomains. Regex, parse_url() or string functions will never produce absolutely correct result.

I recomend use TLD Extract. Here example of code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('http://www.google.co.uk/foo');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'google'
$result->getSuffix(); // will return (string) 'co.uk'
$result->getRegistrableDomain(); // will return (string) 'google.co.uk'

回复收藏 0 原文

对你的占有欲 2024-07-17 18:17:21

当然，这取决于您的具体用例，但一般来说，我不会对 TLD 使用表查找。新 TLD 出现后，您通常不想在任何地方维护它们。只需询问我的 [email protected] 因短视而被拒绝的频率。

我想如果我知道你为什么不想要 www，我可以提供更好的帮助？您需要它来发送电子邮件吗？在这种情况下，您可以查询 MX 记录以验证它（最终）接受邮件。

您还可以找到有关处理 DNS 记录的 PHP 函数的帮助，以了解有关它们的更多信息，请参阅 http://php.net例如 /dns_get_record。

回复收藏 0 原文

故人的歌 2024-07-17 18:17:21

只是一个证明，假设允许的顶级域名被存储到哈希中。
代码可以缩短很多。

<?php
    $urlCompoments=parse_url($theUrl);
    $chunk=explode('.',$urlComponents['host']);

    $tldIndex = count($chunk-1); // assume last chunk is tld
    $maxTldLen = 2; // assuming a tld can be in the form .com or .co.uk
    $cursor=1;
    $found=false;
    while(($cursor<=$maxTldLen) or $found) {
      $tls = implode('.',array_slice($chunk, -$cursor));
      $found=isset($tldSuffixesAllowed[$tld]);
      $cursor++;
    }
    if ($found){
       $tld=implode('.',array_slice($chunk, -$cursor));
    } else {
       // domain not recognized, do wathever you want
    }
?>

Just a proof, assuming the allowed tlds are memorized into an hash.
The code can be shortened a lot.

<?php
    $urlCompoments=parse_url($theUrl);
    $chunk=explode('.',$urlComponents['host']);

    $tldIndex = count($chunk-1); // assume last chunk is tld
    $maxTldLen = 2; // assuming a tld can be in the form .com or .co.uk
    $cursor=1;
    $found=false;
    while(($cursor<=$maxTldLen) or $found) {
      $tls = implode('.',array_slice($chunk, -$cursor));
      $found=isset($tldSuffixesAllowed[$tld]);
      $cursor++;
    }
    if ($found){
       $tld=implode('.',array_slice($chunk, -$cursor));
    } else {
       // domain not recognized, do wathever you want
    }
?>

回复收藏 0 原文

辞别 2024-07-17 18:17:21

有一个非常简单的解决方案：

function get_domain($url) {
  $pieces = parse_url($url);
  return array_pop(explode('.', $pieces['host'], 2));
}

这肯定有效吗？

There is a really easy solution to this:

function get_domain($url) {
  $pieces = parse_url($url);
  return array_pop(explode('.', $pieces['host'], 2));
}

Surely this will work?

回复收藏 0 原文

~没有更多了~

关于作者

纵山崖

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

去 PHP parse_url() 不去的地方 - 仅解析域名

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

qq_otekVRB4

阿杰

姐不稀罕

qq_pphr7

━╋う一瞬間旳綻放

贺

友情链接

去 PHP parse_url() 不去的地方 - 仅解析域名

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

qq_otekVRB4

阿杰

姐不稀罕

qq_pphr7

━╋う一瞬間旳綻放

贺

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。