以字符串形式检索 URL 的内容

发布于 2024-09-08 10:58:09 字数 259 浏览 5 评论 0原文

由于与 Hpricot 相关的繁琐原因,我需要编写一个传递 URL 的函数,并将页面的全部内容作为单个字符串返回。

我很接近。我知道我需要使用 OpenURI,它应该看起来像这样:

require 'open-uri'
open(url) {
  # do something mysterious here to get page_string
}
puts page_string

任何人都可以建议我需要添加什么吗?

For tedious reasons to do with Hpricot, I need to write a function that is passed a URL, and returns the whole contents of the page as a single string.

I'm close. I know I need to use OpenURI, and it should look something like this:

require 'open-uri'
open(url) {
  # do something mysterious here to get page_string
}
puts page_string

Can anyone suggest what I need to add?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

美人骨 2024-09-15 10:58:09

您可以在没有 OpenURI 的情况下执行相同的操作:

require 'net/http'
require 'uri'

def open(url)
  Net::HTTP.get(URI.parse(url))
end

page_content = open('http://www.google.com')
puts page_content

或者更简洁地说:

Net::HTTP.get(URI.parse('http://www.google.com'))

You can do the same without OpenURI:

require 'net/http'
require 'uri'

def open(url)
  Net::HTTP.get(URI.parse(url))
end

page_content = open('http://www.google.com')
puts page_content

Or, more succinctly:

Net::HTTP.get(URI.parse('http://www.google.com'))
伪装你 2024-09-15 10:58:09

open 方法在产生时将资源的 IO 表示形式传递给您的块。您可以使用 IO#read 方法

open([mode [, perm]] [, options]) [{|io| ... }] 
open(path) { |io| data = io.read }

The open method passes an IO representation of the resource to your block when it yields. You can read from it using the IO#read method

open([mode [, perm]] [, options]) [{|io| ... }] 
open(path) { |io| data = io.read }
动次打次papapa 2024-09-15 10:58:09
require 'open-uri'
open(url) do |f|
  page_string = f.read
end

另请参阅 IO 类 的文档

require 'open-uri'
open(url) do |f|
  page_string = f.read
end

See also the documentation of IO class

梦在夏天 2024-09-15 10:58:09

我也很困惑该使用什么来获得更好的性能和更快的结果。我对两者进行了基准测试,以使其更清楚:

require 'benchmark'
require 'net/http'
require "uri"
require 'open-uri'

url = "http://www.google.com"
Benchmark.bm do |x|
  x.report("net-http:")   { content = Net::HTTP.get_response(URI.parse(url)).body if url }
  x.report("open-uri:")   { open(url){|f| content =  f.read } if url }
end

其结果是:

              user     system      total        real
net-http:  0.000000   0.000000   0.000000 (  0.097779)
open-uri:  0.030000   0.010000   0.040000 (  0.864526)

我想说这取决于您的要求是什么以及您想要如何处理。

I was also very confused what to use for better performance and speedy results. I ran a benchmark for both to make it more clear:

require 'benchmark'
require 'net/http'
require "uri"
require 'open-uri'

url = "http://www.google.com"
Benchmark.bm do |x|
  x.report("net-http:")   { content = Net::HTTP.get_response(URI.parse(url)).body if url }
  x.report("open-uri:")   { open(url){|f| content =  f.read } if url }
end

Its result is:

              user     system      total        real
net-http:  0.000000   0.000000   0.000000 (  0.097779)
open-uri:  0.030000   0.010000   0.040000 (  0.864526)

I'd like to say that it depends on what your requirement is and how you want to process.

梦断已成空 2024-09-15 10:58:09

为了使代码更清晰一些,OpenURI open 方法将返回块返回的值,因此您可以将 open 的返回值分配给您的变量。例如:

xml_text = open(url) { |io| io.read }

To make code a little clearer, the OpenURI open method will return the value returned by the block, so you can assign open's return value to your variable. For example:

xml_text = open(url) { |io| io.read }
哆啦不做梦 2024-09-15 10:58:09

从 Ruby 3.0 开始,通过 Kernel#open 调用 URI.open 已被删除,因此请直接调用 URI.open

require 'open-uri'
page_string = URI.open(url, &:read)

Starting with Ruby 3.0, calling URI.open via Kernel#open has been removed, so instead call URI.open directly:

require 'open-uri'
page_string = URI.open(url, &:read)
棒棒糖 2024-09-15 10:58:09

请尝试以下方法:

require 'open-uri' 
content = URI(your_url).read

Try the following instead:

require 'open-uri' 
content = URI(your_url).read
云胡 2024-09-15 10:58:09

require 'open-uri'
open(url) {|f|  #url must specify the protocol
str = f.read()
}

require 'open-uri'
open(url) {|f|  #url must specify the protocol
str = f.read()
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文