如何使用 php 获取网站的 rss feed url?

发布于 2024-11-28 00:59:07 字数 58 浏览 0 评论 0 原文

我需要以编程方式查找网站的 rss feed url。

[使用php或jquery]

I need to find the rss feed url of a website programmatically.

[Either using php or jquery]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

画▽骨i 2024-12-05 00:59:07

一般流程已经得到解答(DOOManiac),所以一些代码(演示):

<?php

$location = 'http://hakre.wordpress.com/';
$html = file_get_contents($location);
echo getRSSLocation($html, $location); # http://hakre.wordpress.com/feed/

/**
 * @link http://keithdevens.com/weblog/archive/2002/Jun/03/RSSAuto-DiscoveryPHP
 */
function getRSSLocation($html, $location){
    if(!$html or !$location){
        return false;
    }else{
        #search through the HTML, save all <link> tags
        # and store each link's attributes in an associative array
        preg_match_all('/<link\s+(.*?)\s*\/?>/si', $html, $matches);
        $links = $matches[1];
        $final_links = array();
        $link_count = count($links);
        for($n=0; $n<$link_count; $n++){
            $attributes = preg_split('/\s+/s', $links[$n]);
            foreach($attributes as $attribute){
                $att = preg_split('/\s*=\s*/s', $attribute, 2);
                if(isset($att[1])){
                    $att[1] = preg_replace('/([\'"]?)(.*)\1/', '$2', $att[1]);
                    $final_link[strtolower($att[0])] = $att[1];
                }
            }
            $final_links[$n] = $final_link;
        }
        #now figure out which one points to the RSS file
        for($n=0; $n<$link_count; $n++){
            if(strtolower($final_links[$n]['rel']) == 'alternate'){
                if(strtolower($final_links[$n]['type']) == 'application/rss+xml'){
                    $href = $final_links[$n]['href'];
                }
                if(!$href and strtolower($final_links[$n]['type']) == 'text/xml'){
                    #kludge to make the first version of this still work
                    $href = $final_links[$n]['href'];
                }
                if($href){
                    if(strstr($href, "http://") !== false){ #if it's absolute
                        $full_url = $href;
                    }else{ #otherwise, 'absolutize' it
                        $url_parts = parse_url($location);
                        #only made it work for http:// links. Any problem with this?
                        $full_url = "http://$url_parts[host]";
                        if(isset($url_parts['port'])){
                            $full_url .= ":$url_parts[port]";
                        }
                        if($href{0} != '/'){ #it's a relative link on the domain
                            $full_url .= dirname($url_parts['path']);
                            if(substr($full_url, -1) != '/'){
                                #if the last character isn't a '/', add it
                                $full_url .= '/';
                            }
                        }
                        $full_url .= $href;
                    }
                    return $full_url;
                }
            }
        }
        return false;
    }
}

请参阅:使用 PHP 自动发现 RSS(存档副本)

The general process has already been answered (Quentin, DOOManiac), so some code (Demo):

<?php

$location = 'http://hakre.wordpress.com/';
$html = file_get_contents($location);
echo getRSSLocation($html, $location); # http://hakre.wordpress.com/feed/

/**
 * @link http://keithdevens.com/weblog/archive/2002/Jun/03/RSSAuto-DiscoveryPHP
 */
function getRSSLocation($html, $location){
    if(!$html or !$location){
        return false;
    }else{
        #search through the HTML, save all <link> tags
        # and store each link's attributes in an associative array
        preg_match_all('/<link\s+(.*?)\s*\/?>/si', $html, $matches);
        $links = $matches[1];
        $final_links = array();
        $link_count = count($links);
        for($n=0; $n<$link_count; $n++){
            $attributes = preg_split('/\s+/s', $links[$n]);
            foreach($attributes as $attribute){
                $att = preg_split('/\s*=\s*/s', $attribute, 2);
                if(isset($att[1])){
                    $att[1] = preg_replace('/([\'"]?)(.*)\1/', '$2', $att[1]);
                    $final_link[strtolower($att[0])] = $att[1];
                }
            }
            $final_links[$n] = $final_link;
        }
        #now figure out which one points to the RSS file
        for($n=0; $n<$link_count; $n++){
            if(strtolower($final_links[$n]['rel']) == 'alternate'){
                if(strtolower($final_links[$n]['type']) == 'application/rss+xml'){
                    $href = $final_links[$n]['href'];
                }
                if(!$href and strtolower($final_links[$n]['type']) == 'text/xml'){
                    #kludge to make the first version of this still work
                    $href = $final_links[$n]['href'];
                }
                if($href){
                    if(strstr($href, "http://") !== false){ #if it's absolute
                        $full_url = $href;
                    }else{ #otherwise, 'absolutize' it
                        $url_parts = parse_url($location);
                        #only made it work for http:// links. Any problem with this?
                        $full_url = "http://$url_parts[host]";
                        if(isset($url_parts['port'])){
                            $full_url .= ":$url_parts[port]";
                        }
                        if($href{0} != '/'){ #it's a relative link on the domain
                            $full_url .= dirname($url_parts['path']);
                            if(substr($full_url, -1) != '/'){
                                #if the last character isn't a '/', add it
                                $full_url .= '/';
                            }
                        }
                        $full_url .= $href;
                    }
                    return $full_url;
                }
            }
        }
        return false;
    }
}

See: RSS auto-discovery with PHP (archived copy).

以歌曲疗慰 2024-12-05 00:59:07

这比仅仅在此处粘贴一些代码要复杂得多。但我可以为你指明你需要做什么的正确方向。

  1. 首先,您需要获取页面,
  2. 解析返回的字符串,查找 RSS 自动发现元标记。您可以将整个文档映射为 XML 并使用 DOM 遍历,但我只使用正则表达式。
  3. 提取标签的 href 部分,您现在就拥有了 RSS 源的 URL。

This is something a lot more involved than just pasting some code here. But I can point you in the right direction for what you need to do.

  1. First you need to fetch the page
  2. Parse the string you get back looking for the RSS Autodiscovery Meta tag. You can either map the whole document out as XML and use DOM traversal, but I would just use a regular expression.
  3. Extract the href portion of the tag and you now have the URL to the RSS feed.
梦在深巷 2024-12-05 00:59:07

使 RSS 可发现的规则有相当详细的记录。您只需要解析 HTML 并查找所描述的元素即可。

The rules for making RSS discoverable are fairly well documented. You just need to parse the HTML and look for the elements described.

Hello爱情风 2024-12-05 00:59:07

一个稍小的函数,它将获取第一个可用的提要,无论是 rss 还是 Atom(大多数博客有两个选项 - 这会获取第一个偏好)。

public function getFeedUrl($url){
        if(@file_get_contents($url)){
            preg_match_all('/<link\srel\=\"alternate\"\stype\=\"application\/(?:rss|atom)\+xml\"\stitle\=\".*href\=\"(.*)\"\s\/\>/', file_get_contents($url), $matches);
            return $matches[1][0];
        }
        return false;
    }

A slightly smaller function that will grab the first available feed, whether it is rss or atom (most blogs have two options - this grabs the first preference).

public function getFeedUrl($url){
        if(@file_get_contents($url)){
            preg_match_all('/<link\srel\=\"alternate\"\stype\=\"application\/(?:rss|atom)\+xml\"\stitle\=\".*href\=\"(.*)\"\s\/\>/', file_get_contents($url), $matches);
            return $matches[1][0];
        }
        return false;
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文