hpricot:从 URL 获取图像并解析元素

发布于 2024-11-11 15:31:47 字数 1213 浏览 4 评论 0原文

我正在尝试获取页面内图像的确切 URL,然后下载它。我还没有到达下载点,因为我正在尝试隔离图像的 URL。这是代码:

#!/usr/bin/ruby -w

require 'rubygems'
require 'hpricot'
require 'open-uri'

raw = Hpricot(open("http://www.amazon.com/Weezer/dp/B000003TAW/"))
ele = raw.search("img[@src*=jpg]").first
img = ele.match("(\")(.*?)(\")").captures
puts img[1]

当我按原样运行它时,我收到:

undefined method `match' for #<Hpricot::Elem:0xb731948c> (NoMethodError)

如果我注释掉最后两行并添加

puts ele

我得到:

<img src="http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" />

这是我想要解析的页面的正确部分。但是,错误是当我尝试获取“http:// /ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;"部分。

我不完全确定为什么它不能执行匹配,因为我理解我正在运行的搜索应该获取图像元素的数组并返回第一个。所以我假设我无法在整个数组上运行匹配,所以我尝试了

img = ele[1].match("(\")(.*?)(\")").captures
puts img

,结果返回

undefined method `match' for nil:NilClass (NoMethodError)

我迷路了。请原谅我的无知,因为我刚刚开始学习 ruby​​。任何帮助表示赞赏。

i am trying to get the exact URL of an image inside a page and then download it. i haven't yet gotten to the download point, as i am trying to isolate the URL of the image. here is the code:

#!/usr/bin/ruby -w

require 'rubygems'
require 'hpricot'
require 'open-uri'

raw = Hpricot(open("http://www.amazon.com/Weezer/dp/B000003TAW/"))
ele = raw.search("img[@src*=jpg]").first
img = ele.match("(\")(.*?)(\")").captures
puts img[1]

when i run it as it is, i receive:

undefined method `match' for #<Hpricot::Elem:0xb731948c> (NoMethodError)

if i comment out the last 2 lines and add

puts ele

i get:

<img src="http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" />

which is the correct portion of the page i want to parse. however, the error is when i try to get just the "http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" part.

i am not totally sure why it can't perform a match, as I understand the search i am running should be getting an array of the image elements and returning the first. so i assumed that i could not run the match on the entire array, so i tried

img = ele[1].match("(\")(.*?)(\")").captures
puts img

and that returns

undefined method `match' for nil:NilClass (NoMethodError)

i am lost. please excuse my ignorance, as i am just beginning to learn ruby. any help is appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

白龙吟 2024-11-18 15:31:47

将此行更改

img = ele.match("(\")(.*?)(\")").captures

为:

img = ele[:src]

错误的原因是 Hpricot:Elem 不是字符串。尝试一下:

ele.responde.to? :match

你会得到错误的结果。

但是,您可以这样做:

ele.to_s.match("(\")(.*?)(\")").captures[1]

秘密就在 to_s

Change this line:

img = ele.match("(\")(.*?)(\")").captures

To:

img = ele[:src]

The reason for the errors is that Hpricot:Elem isn't a string. Try:

ele.responde.to? :match

and you get false.

However, you could do:

ele.to_s.match("(\")(.*?)(\")").captures[1]

the secret is in the to_s

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文