如何使用 PHP 获取网站的最终、重定向、规范 URL?
在链接缩短器和 Ajax 时代,可能有许多链接最终指向相同的内容。我想知道获得 PHP 网站的最终、最佳链接的最佳方法是什么,希望有一个库。我在 Google 或 GitHub 上找不到任何内容。
我已经看过这个示例代码,但它不处理诸如 rel="canonical" 元标记或默认 ssl 端口之类的事情: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
Facebook 似乎处理得很好,你可以看到他们如何遵循 301 和 rel="canonical" 等。要查看 Facebook 处理方式的示例,请使用他们的 Open Graph 工具:
https://developers.facebook.com/tools/debug
并输入以下链接:
http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856
是否有一个 PHP 库已经预先构建了这个库,它将检查这些标头,解析 301 重定向,解析 rel="canonical",检测重定向循环并正确获取最佳结果 URL 来使用?
作为替代方案,我对可以使用的 API 持开放态度,但更喜欢在我自己的服务器上运行的 API。
In the days of link shorteners and Ajax, there can be many links that ultimately point to the same content. I was wondering what the best way is to get the final, best link for a web site in PHP, hopefully with a library. I was unable to find anything on Google or GitHub.
I have seen this example code, but it doesn't handle things like a rel="canonical" meta tags or default ssl ports: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
Facebook seems to handle this pretty well, you can see how they follow 301's and rel="canonical", etc. To see examples of the way Facebook handles it, use their Open Graph tool:
https://developers.facebook.com/tools/debug
and enter these links:
http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856
Is there a PHP library out there that already has this pre-built, where it will check for these headers, resolve 301 redirects, parse rel="canonical", detect redirect loops and properly just grab the best resulting URL to use?
As an alternative, I am open to APIs that can be used, but would prefer something that runs on my own server.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于我找不到任何真正能实现我想要的功能的库,并且我希望做的不仅仅是遵循 HTTP 重定向,因此我继续创建了一个可以实现目标的库,并在 MIT 下发布了它执照。您可以在这里获取它:
https://github.com/mattwright/URLResolver.php
URLResolver。 php 是一个 PHP 类,它尝试将 URL 解析为最终的规范链接:
中找到的标签
我当然不是 HTTP 重定向规则方面的专家,因此,如果有人对如何改进此库有建议,我会不胜感激。我已经对数千个 URL 进行了测试,效果似乎相当不错。我听从了 Mario 的建议,并在需要时使用了 PHP Simple HTML Parser 库。
Since I wasn't able to find any libraries that really did what I was looking for, and I was hoping to do more than just follow HTTP redirects, I have gone ahead and created a library that accomplishes the goals and released it under the MIT license. You can get it here:
https://github.com/mattwright/URLResolver.php
URLResolver.php is a PHP class that attempts to resolve URLs to a final, canonical link:
I am certainly not an expert on the rules of HTTP redirection, so if anyone has suggestions on how to improve this library, it would be greatly appreciated. I have tested in on thousands of URLs and it seems to do pretty well. I followed Mario's advice and used PHP Simple HTML Parser library where needed.
使用 Guzzle(一个众所周知且强大的 HTTP 客户端),您可以这样做:
Using Guzzle (a well known and robust HTTP client) you can do it like that:
我给你写了一个小函数来做到这一点。这很简单,但它可能是您的起点。注意:http://dlvr.it/xxb0W url 返回其 Location 响应标头的无效 URL。
您需要 Altumo PHP 库才能使其工作。这是我写的一个库,但它是 MIT 许可证,这个函数也是如此。
请参阅: https://github.com/homer6/altumo
另外,您必须包装该函数在尝试/捕获中。
如果您需要进一步修改或帮助实现它,请告诉我。
I wrote you a little function to do it. It's simple, but it may be a starting point for you. Note: the http://dlvr.it/xxb0W url returns an invalid URL for it's Location response header.
You'll need the Altumo PHP library for it to work. It's a library that I wrote, but it's MIT license, as is this function.
See: https://github.com/homer6/altumo
Also, you'll have to wrap the function in a try/catch.
Please let me know if you'd like further modifications or help getting it going.