Ruby Mechanize 屏幕抓取帮助
我正在尝试在表中抓取一行并包含日期。我只想抓取有今天日期的第三行。
这是我的机械化代码。我正在尝试选择具有今天日期及其列的列行:
agent.page.search("//td").map(&:text).map(&:strip)
Output:
"11-02-2011", "1", "1", "1", "1", "0", "0,00 DKK", "0,00", "0,00 DKK",
"12-02-2011", "5", "5", "1", "4", "0", "0,00 DKK", "0,00", "0,00 DKK",
"14-02-2011", "1", "3", "1", "1", "0", "0,00 DKK", ",00", "0,00 DKK",
"7", "9", "3", "6", "0", "0,00 DKK", "0,00", "0,00 DKK
"
我想要只刮第三行,即今天的日期。
I am trying to scrape a row in a table with a date. I want to scrape only the third row that have the date today.
This is my mechanize code. I am trying to select the colum row witch have the date today and its and its columns:
agent.page.search("//td").map(&:text).map(&:strip)
Output:
"11-02-2011", "1", "1", "1", "1", "0", "0,00 DKK", "0,00", "0,00 DKK",
"12-02-2011", "5", "5", "1", "4", "0", "0,00 DKK", "0,00", "0,00 DKK",
"14-02-2011", "1", "3", "1", "1", "0", "0,00 DKK", ",00", "0,00 DKK",
"7", "9", "3", "6", "0", "0,00 DKK", "0,00", "0,00 DKK
"
I want to only scrape the third row that is the date today.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要使用
'//td'
循环遍历标记,而是搜索
标记,仅获取第三个,然后循环
'//td'
。Mechanize 在内部使用 Nokogiri,因此使用 Nokogiri 语的操作方法如下:
使用
.search('//tr')[2].search('td').map{ |n| n.text }
附加到 Mechanize 的agent.page
中,如下所示:自从我玩 Mechanize 以来已经有一段时间了,所以它也可能是
agent.page.parser.. .
。编辑:
将这些信息放入您原来的问题中非常重要。您的问题越准确,我们的答案就越准确。
Rather than loop over the
<td>
tags using'//td'
, search for the<tr>
tags, grab only the third one, then loop over'//td'
.Mechanize uses Nokogiri internally, so here's how to do it in Nokogiri-ese:
Use the
.search('//tr')[2].search('td').map{ |n| n.text }
appended to Mechanize'sagent.page
, like so:It's been a while since I played with Mechanize, so it might also be
agent.page.parser...
.EDIT:
It's important to put that information into your original question. The more accurate your question, the more accurate our answers.