发现网站的 feed URL
如何发现网站的 Feed URL?
当我抓取 Microsoft 博客 HTML 时,我可以看到以下内容:
<link rel="alternate" type="application/rss+xml" title="Site Home (RSS 2.0)" href="http://blogs.technet.com/rss.aspx" />
<link rel="alternate" type="application/rss+xml" title="B1ackD0g's Comments (RSS 2.0)" href="/members/B1ackD0g/comments/rss.aspx" />
<link rel="alternate" type="application/rss+xml" title="B1ackD0g's Activities (RSS 2.0)" href="/members/B1ackD0g/activities/rss.aspx" />
<link rel="alternate" type="application/rss+xml" title="Activities of People B1ackD0g Follows (RSS 2.0)" href="/members/B1ackD0g/activities/followersrss.aspx" />
<link rel="alternate" type="application/rss+xml" title="B1ackD0g's Groups Activities (RSS 2.0)" href="/members/B1ackD0g/activities/groupsrss.aspx" />
<link rel="alternate" type="application/rss+xml" title="The Official Microsoft Blog – News and Perspectives from Microsoft (RSS 2.0)" href="http://blogs.technet.com/b/microsoft_blog/rss.aspx" />
<link rel="alternate" type="application/atom+xml" title="The Official Microsoft Blog – News and Perspectives from Microsoft (Atom 1.0)" href="http://blogs.technet.com/b/microsoft_blog/atom.aspx" />
我可以假设的是我可以查找带有以“http://blogs.technet.com/b/ 开头的 href 的标签microsoft_blog/"
这个假设安全吗?
我需要做的基本上是获取一个 URL 并返回其提要 URL。
How can I discover a website's feed URL?
When I grab Microsoft's blog HTML, I can see this:
<link rel="alternate" type="application/rss+xml" title="Site Home (RSS 2.0)" href="http://blogs.technet.com/rss.aspx" />
<link rel="alternate" type="application/rss+xml" title="B1ackD0g's Comments (RSS 2.0)" href="/members/B1ackD0g/comments/rss.aspx" />
<link rel="alternate" type="application/rss+xml" title="B1ackD0g's Activities (RSS 2.0)" href="/members/B1ackD0g/activities/rss.aspx" />
<link rel="alternate" type="application/rss+xml" title="Activities of People B1ackD0g Follows (RSS 2.0)" href="/members/B1ackD0g/activities/followersrss.aspx" />
<link rel="alternate" type="application/rss+xml" title="B1ackD0g's Groups Activities (RSS 2.0)" href="/members/B1ackD0g/activities/groupsrss.aspx" />
<link rel="alternate" type="application/rss+xml" title="The Official Microsoft Blog – News and Perspectives from Microsoft (RSS 2.0)" href="http://blogs.technet.com/b/microsoft_blog/rss.aspx" />
<link rel="alternate" type="application/atom+xml" title="The Official Microsoft Blog – News and Perspectives from Microsoft (Atom 1.0)" href="http://blogs.technet.com/b/microsoft_blog/atom.aspx" />
What I can assume here is that I can look for tags with hrefs that starts with "http://blogs.technet.com/b/microsoft_blog/"
Is this safe to assume?
What I need to do is basically get a URL and return its feed URL.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
没有安全的方法可以在不知道的情况下假设网站的 feed url 是什么。在此示例中,属性
type
值似乎足以确定 Feed,但不能保证在示例之外进行设置。您可以通过搜索包含 RSS 的链接的标记来尝试猜测,甚至可以针对 feedburner http://feeds 等服务进行测试。 feedburner.com/somedomain 但您仍然不能确定。There is no safe way to assume what a website's feed url is without knowing it. In this example the attribute
type
value seems to be enough to determine the feed but that is not guaranteed to be set outside of your example. You can try and guess by searching the markup for links containing RSS or even testing against a service like feedburner http://feeds.feedburner.com/somedomain but you still can not be sure.