如何将域名与 URL 隔离?

发布于 2024-07-07 07:56:30 字数 276 浏览 12 评论 0 原文

我正在寻找一种方法(或函数)来删除输入到函数中的任何 URL 的 example.ext 部分。 域名扩展名可以是任何内容(.com、.co.uk、.nl、.whatever),输入的 URL 可以是从 http://www.example.comwww.example.com/path/script.php?=whatever

执行此操作的最佳方法是什么?

我想要 example.com

I'm looking for a method (or function) to strip out the example.ext part of any URL that's fed into the function. The domain extension can be anything (.com, .co.uk, .nl, .whatever), and the URL that's fed into it can be anything from http://www.example.com to www.example.com/path/script.php?=whatever

What's the best way to go about doing this?

I'd like example.com.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

ζ澈沫 2024-07-14 07:56:30

parse_url 将 URL 转换为关联数组:

php > $foo = "http://www.example.com/foo/bar?hat=bowler&accessory=cane";
php > $blah = parse_url($foo);
php > print_r($blah);
Array
(
    [scheme] => http
    [host] => www.example.com
    [path] => /foo/bar
    [query] => hat=bowler&accessory=cane
)

parse_url turns a URL into an associative array:

php > $foo = "http://www.example.com/foo/bar?hat=bowler&accessory=cane";
php > $blah = parse_url($foo);
php > print_r($blah);
Array
(
    [scheme] => http
    [host] => www.example.com
    [path] => /foo/bar
    [query] => hat=bowler&accessory=cane
)
锦爱 2024-07-14 07:56:30

您还可以编写正则表达式来获得您想要的结果。

这是我的尝试:

$pattern = '/\w+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
$url = 'http://www.example.com/foo/bar?hat=bowler&accessory=cane';
if (preg_match($pattern, $url, $matches) === 1) {
    echo $matches[0];
}

输出是:

example.com

此模式还考虑了“example.com.au”等域。

注:我没有查阅相关的RFC。

You can also write a regular expression to get exactly what you want.

Here is my attempt at it:

$pattern = '/\w+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
$url = 'http://www.example.com/foo/bar?hat=bowler&accessory=cane';
if (preg_match($pattern, $url, $matches) === 1) {
    echo $matches[0];
}

The output is:

example.com

This pattern also takes into consideration domains such as 'example.com.au'.

Note: I have not consulted the relevant RFC.

讽刺将军 2024-07-14 07:56:30

您可以使用 parse_url() 来执行此操作

$url = 'http://www.example.com';
$domain = parse_url($url, PHP_URL_HOST);
$domain = str_replace('www.','',$domain);

:例如,$domain 应该包含 example.com,无论它是否有 www。 它也适用于 .co.uk 等域名

You can use parse_url() to do this:

$url = 'http://www.example.com';
$domain = parse_url($url, PHP_URL_HOST);
$domain = str_replace('www.','',$domain);

In this example, $domain should contain example.com, irrespective of it having www or not. It also works for a domain such as .co.uk

若言繁花未落 2024-07-14 07:56:30

以下代码将从绝对 URL 中删除协议、域和端口:

$urlWithoutDomain = preg_replace('#^.+://[^/]+#', '', $url);

Following code will trim protocol, domain and port from absolute URL:

$urlWithoutDomain = preg_replace('#^.+://[^/]+#', '', $url);
隐诗 2024-07-14 07:56:30

这里有几个简单的函数,可以从普通域或长域 (test.sub.domain.com) 或 url (http://www.example.com) 获取根域 (example.com)。

/**
 * Get root domain from full domain
 * @param string $domain
 */
public function getRootDomain($domain)
{
    $domain = explode('.', $domain);

    $tld = array_pop($domain);
    $name = array_pop($domain);

    $domain = "$name.$tld";

    return $domain;
}

/**
 * Get domain name from url
 * @param string $url
 */
public function getDomainFromUrl($url)
{
    $domain = parse_url($url, PHP_URL_HOST);
    $domain = $this->getRootDomain($domain);

    return $domain;
}

Here are a couple simple functions to get the root domain (example.com) from a normal or long domain (test.sub.domain.com) or url (http://www.example.com).

/**
 * Get root domain from full domain
 * @param string $domain
 */
public function getRootDomain($domain)
{
    $domain = explode('.', $domain);

    $tld = array_pop($domain);
    $name = array_pop($domain);

    $domain = "$name.$tld";

    return $domain;
}

/**
 * Get domain name from url
 * @param string $url
 */
public function getDomainFromUrl($url)
{
    $domain = parse_url($url, PHP_URL_HOST);
    $domain = $this->getRootDomain($domain);

    return $domain;
}
(り薆情海 2024-07-14 07:56:30

解决了这个问题...

假设我们正在调用 dev.mysite.com,并且我们想要提取“mysite.com”,

$requestedServerName = $_SERVER['SERVER_NAME']; // = dev.mysite.com

$thisSite = explode('.', $requestedServerName); // site name now an array

array_shift($thisSite); //chop off the first array entry eg 'dev'

$thisSite = join('.', $thisSite); //join it back together with dots ;)

echo $thisSite; //outputs 'mysite.com'

它也适用于 mysite.co.uk,所以应该可以在任何地方工作:)

Solved this...

Say we're calling dev.mysite.com and we want to extract 'mysite.com'

$requestedServerName = $_SERVER['SERVER_NAME']; // = dev.mysite.com

$thisSite = explode('.', $requestedServerName); // site name now an array

array_shift($thisSite); //chop off the first array entry eg 'dev'

$thisSite = join('.', $thisSite); //join it back together with dots ;)

echo $thisSite; //outputs 'mysite.com'

Works with mysite.co.uk too so should work everywhere :)

拔了角的鹿 2024-07-14 07:56:30

我花了一些时间思考使用正则表达式是否有意义,但最终我认为没有意义。

Firstresponder 的正则表达式几乎让我相信这是最好的方法,但它对任何缺少尾部斜杠的东西不起作用(所以 http:/例如,/example.com)。 我用以下代码修复了这个问题: '/\w+\..{2,3}(?:\..{2,3})?(?=[\/\W])/i',但后来我意识到像 'http://example.com/index.htm< 这样的网址匹配两次/a>'。 哎呀。 这不会那么糟糕(只需使用第一个),但它也会匹配两次,如下所示: 'http://abc.ed.fg.hij.kl.mn/',第一个匹配项不正确。 :(

一位同事建议只获取主机(通过 parse_url()),然后只获取最后两个或三个数组位(split() on '. ')这两个或三个将基于域列表,例如“co.uk”等。组成该列表成为困难的部分。

I spent some time thinking about whether it makes sense to use a regular expression for this, but in the end I think not.

firstresponder's regexp came close to convincing me it was the best way, but it didn't work on anything missing a trailing slash (so http://example.com, for instance). I fixed that with the following: '/\w+\..{2,3}(?:\..{2,3})?(?=[\/\W])/i', but then I realized that matches twice for urls like 'http://example.com/index.htm'. Oops. That wouldn't be so bad (just use the first one), but it also matches twice on something like this: 'http://abc.ed.fg.hij.kl.mn/', and the first match isn't the right one. :(

A co-worker suggested just getting the host (via parse_url()), and then just taking the last two or three array bits (split() on '.') The two or three would be based on a list of domains, like 'co.uk', etc. Making up that list becomes the hard part.

这个函数应该可以工作:

function Delete_Domain_From_Url($Url = false)
{
    if($Url)
    {
        $Url_Parts = parse_url($Url);
        $Url = isset($Url_Parts['path']) ? $Url_Parts['path'] : '';
        $Url .= isset($Url_Parts['query']) ? "?".$Url_Parts['query'] : '';
    }

    return $Url;
}

要使用它:

$Url = "https://stackoverflow.com/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php";
echo Delete_Domain_From_Url($Url);

# Output: 
#/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php

This function should work:

function Delete_Domain_From_Url($Url = false)
{
    if($Url)
    {
        $Url_Parts = parse_url($Url);
        $Url = isset($Url_Parts['path']) ? $Url_Parts['path'] : '';
        $Url .= isset($Url_Parts['query']) ? "?".$Url_Parts['query'] : '';
    }

    return $Url;
}

To use it:

$Url = "https://stackoverflow.com/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php";
echo Delete_Domain_From_Url($Url);

# Output: 
#/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php
黑凤梨 2024-07-14 07:56:30

提取域名部分的正确方法只有一种,那就是使用公共后缀列表(TLD 数据库)。 我推荐 TLDExtract 包,这里是示例代码:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('www.domain.com/path/script.php?=whatever');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'domain'
$result->getSuffix(); // will return (string) 'com'

There is only one correct way to extract domain parts, it's use Public Suffix List (database of TLDs). I recomend TLDExtract package, here is sample code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('www.domain.com/path/script.php?=whatever');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'domain'
$result->getSuffix(); // will return (string) 'com'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文