在没有 RSS 的地方获取 RSS

发布于 2024-09-24 02:55:21 字数 679 浏览 4 评论 0原文

抱歉,标题很长,也许现在我们来的时候有点混乱。我正在寻求有关如何从默认情况下未启用 RSS 的页面获取 RSS 源的建议或指导。但这不是问题本身。问题是当我在该页面上时被要求输入用户名和密码。好吧,否则就会出现这样的情况...

问题:

获取未启用 RSS 提要的论坛的 RSS,并查看我们需要记录的“新闻”。

我想到的可能的解决方案

  1. 有几个网站提供英文服务,以便在非英文网页上获取 RSS。这很好,但问题是,当这些网站不提供使用用户名和密码登录我想要获取信息的网页的选项时,因此这些类型的网站被排除在外。
  2. 我没有通过 url 登录,因此将该 url 放在论坛上面列出的网站(第 1 项)上,并使用直接来自 url 规范的用户名和密码变量: www.forosinrss/login.php?usuario = me & ;密码 = 你的 pff 然后我被退回论坛,告诉我我没有得到正确的数据。另一个问题是密码是 md5 加密的,所以我无法使用 URL 登录(fk T_T)。
  3. 尝试使用“SELECT * FROM DB Internet”,或者换句话说,使用 YQL。但结果几乎和他们发现无法插入和登录用户和密码以及为论坛生成 cookie 一样,我对投票不满意。

我需要建议、推荐、提示或投诉。

Sorry for the long title and perhaps confusing half good now as we come. I'm asking advice or guidance on how I can get an RSS feed from a page that does not have RSS enabled by default. But that is not the problem itself. The problem is when on that page I am asked to enter a username and password. Well so otherwise would be the thing...

PROBLEM:

Get the RSS of a forum which does not have an RSS feed enabled and to see the 'news' we need to be logged.

POSSIBLE SOLUTIONS that come to mind:

  1. There are several web sites which offer services in English to get RSS on pages where they are not. That's fine, but the problem is when these sites don't offer an option to login with a username and password to the web page where I want to get the info, so these types of sites are excluded.
  2. I did not login via url and so put that url on web sites listed above (item 1) of the forum with the username and password variables directly from the url spec: www.forosinrss/login.php?usuario = me & password = your pff and I'm bounced the forum, telling me I'm not getting the correct data as we will be. Another problem is that the password is md5 encrypted, so I'm prevented from logging in with the URL (fk T_T).
  3. Try using "SELECT * FROM DB Internet", or in other words, to use YQL. But it came out almost as much as they found no way to insert and log into user and password and also to generate a cookie for the forum is not happy I voted.

I need suggestions, recommendations, tips or complaints.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

东风软 2024-10-01 02:55:22

曾几何时,我用 PHP 编写了一个应用程序来执行此操作,结果还不错:

  • 使用curl 获取页面并保留副本
  • 运行自定义过滤器正则表达式来选择页面中真正重要的部分(某些网站具有动态文本(如广告或仅显示当前日期和时间)
  • 超时后,使用curl再次获取页面并在其上运行相同的过滤器
  • 运行diff old_page,new_page并将结果通过管道传输到rss模板中

系统工作正常但很繁琐将页面过滤为我想要从中获取提要的内容,但它会破坏很多内容,因为这些类型的网站通常是手工编辑的,因此您无法保证任何一致性。

Once upon a time I wrote an app in PHP to do this with ok-ish results:

  • use curl to get the page and keep a copy
  • run a custom filter regular expression to select the bit of the page that actually matters (some sites have dynamic text like ads or just displaying the current date and time)
  • after a timeout, use curl to get the page again and run the same filter on it
  • run diff old_page, new_page and pipe the result into an rss template

The system worked ok but was fiddly filtering the page down to content that I wanted to get the feed from and it broke a lot because these kinds of sites are often hand edited so you can't guarantee any consistency.

无可置疑 2024-10-01 02:55:21

如果您足够勇敢,可以使用 cURL 或 fsockopen 之类的工具下载页面,然后使用 XSLT 样式表将页面从 html 转换为 rss。

Download the page using something like cURL or fsockopen if you're feeling brave, then transform the page from html to rss using XSLT Stylesheets.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文