Ruby Mechanize 表抓取无法捕获整行
我正在尝试用 mechanize 抓取一个表格网站。 我想刮第二行。
当我运行:
agent.page.search('table.ea').search('tr')[-2].search('td').map{ |n| n.text }
我希望它能刮掉整行。但它只抓取: ["2011-02-17", "0,00"]
为什么它不抓取行中的所有列,而只抓取第一列和最后一列?
X路径: /html/body/center/table/tbody/tr[2]/td[2]/table/tbody/tr[3]/td/table/tbody/tr[2]/td/table/tbody/tr[2 ]
CSS 路径: html body center table tbody tr td table tbody tr td table tbody tr td table.ea tbody tr td.total
页面类似如下:
<table><table><table>
<table width="100%" border="0" cellpadding="0" cellspacing="1" class="ea">
<tr>
<th><a href="#">Date</a></th>
<th><a href="#">One</a></th>
<th><a href="#">Two</a></th>
<th><a href="#">Three</a></th>
<th><a href="#">Four</a></th>
<th><a href="#">Five</a></th>
<th><a href="#">Six</a></th>
<th><a href="#">Seven</a></th>
<th><a href="#">Eight</a></th>
</tr>
<tr>
<td><a href="#">2011-02-17</a></td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0,00</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">387</td>
<td align="right">0,00</td> <!-- FOV -->
<td align="right">0,00</td>
</tr>
<tr>
<td class="total">Ialt</td>
<td class="total" align="right">0</td>
<td class="total" align="right">40</td>
<td class="total" align="right">0,46</td>
<td class="total" align="right">2</td>
<td class="total" align="right">0</td>
<td class="total" align="right">0</td>
<td class="total" align="right">0</td>
<td class="total" align="right">3.060</td>
<td class="total" align="right">0,00</td>
<td class="total" align="right">18,58</td>
</tr>
</table>
</table></table></table>
I am trying to scrape a table website with mechanize.
I want to scrape the second row.
When I run :
agent.page.search('table.ea').search('tr')[-2].search('td').map{ |n| n.text }
I would expect it to scrape the whole row. But instead it only scrapes: ["2011-02-17", "0,00"]
Why isn't it scraping all of the columns in the row, but just the first and the last column?
Xpath:
/html/body/center/table/tbody/tr[2]/td[2]/table/tbody/tr[3]/td/table/tbody/tr[2]/td/table/tbody/tr[2]
CSS PATH:
html body center table tbody tr td table tbody tr td table tbody tr td table.ea tbody tr td.total
The page is similar to this:
<table><table><table>
<table width="100%" border="0" cellpadding="0" cellspacing="1" class="ea">
<tr>
<th><a href="#">Date</a></th>
<th><a href="#">One</a></th>
<th><a href="#">Two</a></th>
<th><a href="#">Three</a></th>
<th><a href="#">Four</a></th>
<th><a href="#">Five</a></th>
<th><a href="#">Six</a></th>
<th><a href="#">Seven</a></th>
<th><a href="#">Eight</a></th>
</tr>
<tr>
<td><a href="#">2011-02-17</a></td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0,00</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">387</td>
<td align="right">0,00</td> <!-- FOV -->
<td align="right">0,00</td>
</tr>
<tr>
<td class="total">Ialt</td>
<td class="total" align="right">0</td>
<td class="total" align="right">40</td>
<td class="total" align="right">0,46</td>
<td class="total" align="right">2</td>
<td class="total" align="right">0</td>
<td class="total" align="right">0</td>
<td class="total" align="right">0</td>
<td class="total" align="right">3.060</td>
<td class="total" align="right">0,00</td>
<td class="total" align="right">18,58</td>
</tr>
</table>
</table></table></table>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用以下 Ruby 代码 (https://gist.github.com/835603):
我得到以下内容输出:
Using the following Ruby code (https://gist.github.com/835603):
I get the following output:
我建议你把 Mechanize 留给比刮一页更难的事情。
您可以使用 Nokogiri 比使用 Mechanize 更简单(但当然您可以用它来做),因为您只需 < a href="http://nokogiri.org/tutorials/searching_a_xml_html_document.html" rel="nofollow noreferrer">查询页面。
尝试一下!
这里是关于 nokogiri 的答案的链接 就
我个人而言,当我需要发送表格和类似的东西时,我使用了 Mechanize 尽管它还有很多其他用途!
I would recommend you to leave Mechanize to harder stuff than scraping a page.
You can use Nokogiri much more simple than using Mechanize(but ofcourse you can do it with it) since you can just query the page.
Try it out!
here is a link to an answer regarding nokogiri
Personally I used Mechanize when I needed to send forms and stuff like that albeit there are tons of other uses to it!