如何使用 PHP 获取网站的最终、重定向、规范 URL?

发布于 2024-12-19 02:51:06 字数 848 浏览 0 评论 0原文

在链接缩短器和 Ajax 时代,可能有许多链接最终指向相同的内容。我想知道获得 PHP 网站的最终、最佳链接的最佳方法是什么,希望有一个库。我在 Google 或 GitHub 上找不到任何内容。

我已经看过这个示例代码,但它不处理诸如 rel="canonical" 元标记或默认 ssl 端口之类的事情: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/

Facebook 似乎处理得很好,你可以看到他们如何遵循 301 和 rel="canonical" 等。要查看 Facebook 处理方式的示例,请使用他们的 Open Graph 工具:

https://developers.facebook.com/tools/debug

并输入以下链接:

http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856

是否有一个 PHP 库已经预先构建了这个库,它将检查这些标头,解析 301 重定向,解析 rel="canonical",检测重定向循环并正确获取最佳结果 URL 来使用?

作为替代方案,我对可以使用的 API 持开放态度,但更喜欢在我自己的服务器上运行的 API。

In the days of link shorteners and Ajax, there can be many links that ultimately point to the same content. I was wondering what the best way is to get the final, best link for a web site in PHP, hopefully with a library. I was unable to find anything on Google or GitHub.

I have seen this example code, but it doesn't handle things like a rel="canonical" meta tags or default ssl ports: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/

Facebook seems to handle this pretty well, you can see how they follow 301's and rel="canonical", etc. To see examples of the way Facebook handles it, use their Open Graph tool:

https://developers.facebook.com/tools/debug

and enter these links:

http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856

Is there a PHP library out there that already has this pre-built, where it will check for these headers, resolve 301 redirects, parse rel="canonical", detect redirect loops and properly just grab the best resulting URL to use?

As an alternative, I am open to APIs that can be used, but would prefer something that runs on my own server.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

酒解孤独 2024-12-26 02:51:06

由于我找不到任何真正能实现我想要的功能的库,并且我希望做的不仅仅是遵循 HTTP 重定向,因此我继续创建了一个可以实现目标的库,并在 MIT 下发布了它执照。您可以在这里获取它:

https://github.com/mattwright/URLResolver.php

URLResolver。 php 是一个 PHP 类,它尝试将 URL 解析为最终的规范链接:

  • 遵循 HTTP 标头中找到的 301 和 302 重定向
  • 遵循 Open Graph URL在网页
    中找到的标签
  • 遵循规范 URL网页中发现的标签
  • 如果内容类型不是 HTML 页面,则快速中止下载

我当然不是 HTTP 重定向规则方面的专家,因此,如果有人对如何改进此库有建议,我会不胜感激。我已经对数千个 URL 进行了测试,效果似乎相当不错。我听从了 Mario 的建议,并在需要时使用了 PHP Simple HTML Parser 库。

Since I wasn't able to find any libraries that really did what I was looking for, and I was hoping to do more than just follow HTTP redirects, I have gone ahead and created a library that accomplishes the goals and released it under the MIT license. You can get it here:

https://github.com/mattwright/URLResolver.php

URLResolver.php is a PHP class that attempts to resolve URLs to a final, canonical link:

  • Follows 301 and 302 redirects found in HTTP headers
  • Follows Open Graph URL <meta> tags found in web page <head>
  • Follows Canonical URL <link> tags found in web page <head>
  • Aborts download quickly if content type is not an HTML page

I am certainly not an expert on the rules of HTTP redirection, so if anyone has suggestions on how to improve this library, it would be greatly appreciated. I have tested in on thousands of URLs and it seems to do pretty well. I followed Mario's advice and used PHP Simple HTML Parser library where needed.

听你说爱我 2024-12-26 02:51:06

使用 Guzzle(一个众所周知且强大的 HTTP 客户端),您可以这样做:

<?php
use Guzzle\Http\Client as GuzzleClient;
use Guzzle\Plugin\History\HistoryPlugin;

public function resolveUrl($url)
{
    $client   = new GuzzleClient($url);
    $history  = new HistoryPlugin();
    $client->addSubscriber($history);

    $response = $client->head($url)->send();

    if (!$response->isSuccessful()) {
        throw new \Exception(sprintf("Url %s is not a valid URL or website is down.", $url));
    }

    return $response->getEffectiveUrl();
}

Using Guzzle (a well known and robust HTTP client) you can do it like that:

<?php
use Guzzle\Http\Client as GuzzleClient;
use Guzzle\Plugin\History\HistoryPlugin;

public function resolveUrl($url)
{
    $client   = new GuzzleClient($url);
    $history  = new HistoryPlugin();
    $client->addSubscriber($history);

    $response = $client->head($url)->send();

    if (!$response->isSuccessful()) {
        throw new \Exception(sprintf("Url %s is not a valid URL or website is down.", $url));
    }

    return $response->getEffectiveUrl();
}
似狗非友 2024-12-26 02:51:06

我给你写了一个小函数来做到这一点。这很简单,但它可能是您的起点。注意:http://dlvr.it/xxb0W url 返回其 Location 响应标头的无效 URL。

您需要 Altumo PHP 库才能使其工作。这是我写的一个库,但它是 MIT 许可证,这个函数也是如此。

请参阅: https://github.com/homer6/altumo

另外,您必须包装该函数在尝试/捕获中。

/**
* Gets the final URL of a URL that will be redirected.
* 
* @param string $url_string
* @throws \Exception                    //on error
* @return string
*/
function get_final_url( $url_string ){

    while( 1 ){

        //validate URL
            $url = new \Altumo\String\Url( $url_string );

        //get the Location response header of the URL
            $client = new \Altumo\Http\OutgoingHttpRequest( $url_string );
            $response = $client->sendAndGetResponseMessage();
            $location = $response->getHeader( 'Location' );

        //return the URL if no Location header was found, else continue
            if( is_null($location) ){
                return $url_string;
            }else{
                $url_string = $location;
            }

    }

}

echo get_final_url( 'your url here' );

如果您需要进一步修改或帮助实现它,请告诉我。

I wrote you a little function to do it. It's simple, but it may be a starting point for you. Note: the http://dlvr.it/xxb0W url returns an invalid URL for it's Location response header.

You'll need the Altumo PHP library for it to work. It's a library that I wrote, but it's MIT license, as is this function.

See: https://github.com/homer6/altumo

Also, you'll have to wrap the function in a try/catch.

/**
* Gets the final URL of a URL that will be redirected.
* 
* @param string $url_string
* @throws \Exception                    //on error
* @return string
*/
function get_final_url( $url_string ){

    while( 1 ){

        //validate URL
            $url = new \Altumo\String\Url( $url_string );

        //get the Location response header of the URL
            $client = new \Altumo\Http\OutgoingHttpRequest( $url_string );
            $response = $client->sendAndGetResponseMessage();
            $location = $response->getHeader( 'Location' );

        //return the URL if no Location header was found, else continue
            if( is_null($location) ){
                return $url_string;
            }else{
                $url_string = $location;
            }

    }

}

echo get_final_url( 'your url here' );

Please let me know if you'd like further modifications or help getting it going.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文