Ruby mechanize 未获取完整内容

发布于 2025-01-07 08:19:58 字数 2619 浏览 1 评论 0原文

我正在使用 Mechanize 和 Nokogiri 来解析这两个网站的一些乐透结果(它们非常相似): http://www1.caixa.gov.br/loterias/loterias/lotofacil /lotofacil_resultado.asp http://lotofacil.resultadoloteria.org/

这是我的代码:

require 'nokogiri'
require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get('http://lotofacil.resultadoloteria.org/')
doc = Nokogiri::HTML(page.body)
doc.xpath('//table[@class="tabela_jogo"]//span').each { |value| puts value }

第二个网站工作正常。结果:

<span id="lfacil1">01</span>
<span id="lfacil2">03</span>
<span id="lfacil3">05</span>
<span id="lfacil4">08</span>
<span id="lfacil5">10</span>
<span id="lfacil6">11</span>
<span id="lfacil7">13</span>
<span id="lfacil8">14</span>
<span id="lfacil9">15</span>
<span id="lfacil10">18</span>
<span id="lfacil11">20</span>
<span id="lfacil12">22</span>
<span id="lfacil13">23</span>
<span id="lfacil14">24</span>
<span id="lfacil15">25</span>

但我无法从一开始就得到乐透号码。结果如下:

<span id="lfacil1"></span>
<span id="lfacil2"></span>
<span id="lfacil3"></span>
<span id="lfacil4"></span>
<span id="lfacil5"></span>
<span id="lfacil6"></span>
<span id="lfacil7"></span>
<span id="lfacil8"></span>
<span id="lfacil9"></span>
<span id="lfacil10"></span>
<span id="lfacil11"></span>
<span id="lfacil12"></span>
<span id="lfacil13"></span>
<span id="lfacil14"></span>
<span id="lfacil15"></span>
<span id="lfacil1_2"></span>
<span id="lfacil2_2"></span>
<span id="lfacil3_2"></span>
<span id="lfacil4_2"></span>
<span id="lfacil5_2"></span>
<span id="lfacil6_2"></span>
<span id="lfacil7_2"></span>
<span id="lfacil8_2"></span>
<span id="lfacil9_2"></span>
<span id="lfacil10_2"></span>
<span id="lfacil11_2"></span>
<span id="lfacil12_2"></span>
<span id="lfacil13_2"></span>
<span id="lfacil14_2"></span>
<span id="lfacil15_2"></span>

我认为是 Mechanize 的问题,因为 p page.body 返回的内容也没有乐透号码。有什么想法吗?

谢谢。 :)

I'm using Mechanize and Nokogiri to parse some lotto results from these two sites (they're very similar):
http://www1.caixa.gov.br/loterias/loterias/lotofacil/lotofacil_resultado.asp
http://lotofacil.resultadoloteria.org/

Here's my code:

require 'nokogiri'
require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get('http://lotofacil.resultadoloteria.org/')
doc = Nokogiri::HTML(page.body)
doc.xpath('//table[@class="tabela_jogo"]//span').each { |value| puts value }

The second site works fine. Result:

<span id="lfacil1">01</span>
<span id="lfacil2">03</span>
<span id="lfacil3">05</span>
<span id="lfacil4">08</span>
<span id="lfacil5">10</span>
<span id="lfacil6">11</span>
<span id="lfacil7">13</span>
<span id="lfacil8">14</span>
<span id="lfacil9">15</span>
<span id="lfacil10">18</span>
<span id="lfacil11">20</span>
<span id="lfacil12">22</span>
<span id="lfacil13">23</span>
<span id="lfacil14">24</span>
<span id="lfacil15">25</span>

But I can't get the lotto numbers from the first. Here's the result:

<span id="lfacil1"></span>
<span id="lfacil2"></span>
<span id="lfacil3"></span>
<span id="lfacil4"></span>
<span id="lfacil5"></span>
<span id="lfacil6"></span>
<span id="lfacil7"></span>
<span id="lfacil8"></span>
<span id="lfacil9"></span>
<span id="lfacil10"></span>
<span id="lfacil11"></span>
<span id="lfacil12"></span>
<span id="lfacil13"></span>
<span id="lfacil14"></span>
<span id="lfacil15"></span>
<span id="lfacil1_2"></span>
<span id="lfacil2_2"></span>
<span id="lfacil3_2"></span>
<span id="lfacil4_2"></span>
<span id="lfacil5_2"></span>
<span id="lfacil6_2"></span>
<span id="lfacil7_2"></span>
<span id="lfacil8_2"></span>
<span id="lfacil9_2"></span>
<span id="lfacil10_2"></span>
<span id="lfacil11_2"></span>
<span id="lfacil12_2"></span>
<span id="lfacil13_2"></span>
<span id="lfacil14_2"></span>
<span id="lfacil15_2"></span>

I think is something with Mechanize, because p page.body returns the content without the lotto numbers too. Any ideas?

Thanks. :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

澉约 2025-01-14 08:19:58

那是因为他们不在那里。不过,我为你找到了它们:

page = agent.get('http://www1.caixa.gov.br/loterias/loterias/lotofacil/lotofacil_pesquisa_new.asp')
numbers = page.body.split('|')[3..17]

也不是这个:

doc = Nokogiri::HTML(page.body)

mechanize 已经为你解决了这个问题:

doc = page.parser

That's because they aren't there. I found them for you though:

page = agent.get('http://www1.caixa.gov.br/loterias/loterias/lotofacil/lotofacil_pesquisa_new.asp')
numbers = page.body.split('|')[3..17]

also instead of this:

doc = Nokogiri::HTML(page.body)

mechanize has already taken care of that for you:

doc = page.parser
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文