hpricot:从 URL 获取图像并解析元素
我正在尝试获取页面内图像的确切 URL,然后下载它。我还没有到达下载点,因为我正在尝试隔离图像的 URL。这是代码:
#!/usr/bin/ruby -w
require 'rubygems'
require 'hpricot'
require 'open-uri'
raw = Hpricot(open("http://www.amazon.com/Weezer/dp/B000003TAW/"))
ele = raw.search("img[@src*=jpg]").first
img = ele.match("(\")(.*?)(\")").captures
puts img[1]
当我按原样运行它时,我收到:
undefined method `match' for #<Hpricot::Elem:0xb731948c> (NoMethodError)
如果我注释掉最后两行并添加
puts ele
我得到:
<img src="http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" />
这是我想要解析的页面的正确部分。但是,错误是当我尝试获取“http:// /ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;"部分。
我不完全确定为什么它不能执行匹配,因为我理解我正在运行的搜索应该获取图像元素的数组并返回第一个。所以我假设我无法在整个数组上运行匹配,所以我尝试了
img = ele[1].match("(\")(.*?)(\")").captures
puts img
,结果返回
undefined method `match' for nil:NilClass (NoMethodError)
我迷路了。请原谅我的无知,因为我刚刚开始学习 ruby。任何帮助表示赞赏。
i am trying to get the exact URL of an image inside a page and then download it. i haven't yet gotten to the download point, as i am trying to isolate the URL of the image. here is the code:
#!/usr/bin/ruby -w
require 'rubygems'
require 'hpricot'
require 'open-uri'
raw = Hpricot(open("http://www.amazon.com/Weezer/dp/B000003TAW/"))
ele = raw.search("img[@src*=jpg]").first
img = ele.match("(\")(.*?)(\")").captures
puts img[1]
when i run it as it is, i receive:
undefined method `match' for #<Hpricot::Elem:0xb731948c> (NoMethodError)
if i comment out the last 2 lines and add
puts ele
i get:
<img src="http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" />
which is the correct portion of the page i want to parse. however, the error is when i try to get just the "http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" part.
i am not totally sure why it can't perform a match, as I understand the search i am running should be getting an array of the image elements and returning the first. so i assumed that i could not run the match on the entire array, so i tried
img = ele[1].match("(\")(.*?)(\")").captures
puts img
and that returns
undefined method `match' for nil:NilClass (NoMethodError)
i am lost. please excuse my ignorance, as i am just beginning to learn ruby. any help is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将此行更改
为:
错误的原因是
Hpricot:Elem
不是字符串。尝试一下:你会得到错误的结果。
但是,您可以这样做:
秘密就在
to_s
中Change this line:
To:
The reason for the errors is that
Hpricot:Elem
isn't a string. Try:and you get false.
However, you could do:
the secret is in the
to_s