如何使用 Ruby / Rails 从网站获取内容？

发布于 2024-10-20 23:51:05 字数 139 浏览 5 评论 0原文

我想使用 ruby/rails 从网站复制一些特定内容。我需要的内容位于一个 marquee html 标签内，由 div 分隔。我如何使用 ruby 访问此内容？更准确地说 - 我想使用某种 ruby gui （最好是鞋子）。我该怎么做？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

在你怀里撒娇 2024-10-27 23:51:05

这并不是一个真正的 Rails 问题。这是你使用 Ruby 做的事情，然后可能使用 Rails、Sinatra 或 Padrino 来显示 - 选择你的毒药。

您可以使用多种不同的 HTTP 客户端：

Open-URI 随 Ruby 一起提供，是最简单的。 Net::HTTP 随 Ruby 一起提供，是标准工具箱，但它是较低级别的，因此您必须做更多工作。 HTTPClient 和 Typhoeus+Hydra 都具有线程能力，并且具有高层和低层接口。

我建议使用 Nokogiri 来解析返回的 HTML。它的功能非常齐全且强大。

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.example.com'))

puts doc.to_html

如果您需要在到达需要解析的页面之前浏览登录屏幕或填写表单，那么我建议您查看 Mechanize。它内部依赖于 Nokogiri，因此您可以向它请求 Nokogiri 文档，并在 Mechanize 检索到所需 URL 后进行解析。

如果您需要处理动态 HTML，请查看各种 WATIR 工具。它们驱动各种网络浏览器，然后让您访问浏览器所看到的内容。

获得所需的内容或数据后，您可以将其“重新调整用途”为 Rails 页面内的文本。

This isn't really a Rails question. It's something you'd do using Ruby, then possibly display using Rails, or Sinatra or Padrino - pick your poison.

There are several different HTTP clients you can use:

Open-URI comes with Ruby and is the easiest. Net::HTTP comes with Ruby and is the standard toolbox, but it's lower-level so you'd have to do more work. HTTPClient and Typhoeus+Hydra are capable of threading and have both high-level and low-level interfaces.

I recommend using Nokogiri to parse the returned HTML. It's very full-featured and robust.

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open('http://www.example.com'))

puts doc.to_html

If you need to navigate through login screens or fill in forms before you get to the page you need to parse, then I'd recommend looking at Mechanize. It relies on Nokogiri internally so you can ask it for a Nokogiri document and parse away once Mechanize retrieves the desired URL.

If you need to deal with Dynamic HTML, then look into the various WATIR tools. They drive various web browsers then let you access the content as seen by the browser.

Once you have the content or data you want, you can "repurpose" it into text inside a Rails page.

回复收藏 0 原文