使用 hpricot 解析 3 个表列
我得到了一个 HTML 文档,其中包含非常简单的表格,如下所示
<table>
<tr><th>Country</th><th>Date</th></tr>
<tr>
<td><b><a href="/calendar/?region=BE">Belgium</a></b></td>
<td align="right"><a href="/date/04-20/">20 April</a> <a href="/year/2001/">2001</a></td>
<td>(original release)</td>
</tr>
<tr>
<td><b><a href="/calendar/?region=BE">Belgium</a></b></td>
<td align="right"><a href="/date/04-25/">25 April</a> <a href="/year/2001/">2001</a></td>
<td></td>
</tr>
<tr>
<td><b><a href="/calendar/?region=FR">France</a></b></td>
<td align="right"><a href="/date/04-27/">27 April</a> <a href="/year/2001/">2001</a></td>
<td></td>
</tr>
<tr>
<td><b><a href="/calendar/?region=CH">Switzerland</a></b></td>
<td align="right"><a href="/date/05-25/">25 May</a> <a href="/year/2001/">2001</a></td>
<td>(French speaking region)</td>
</tr>
<tr>
<td><b><a href="/calendar/?region=CZ">Czech Republic</a></b></td>
<td align="right"><a href="/date/07-06/">6 July</a> <a href="/year/2001/">2001</a></td>
<td>(International Film Festival)</td>
</tr>
</table>
前两列很容易解析:
document.search("a[@href*=calendar]").each { |country| countries << country.inner_text }
document.search("td[@align*=right]").each { |date| dates << date.inner_text }
但是我在从第三列查找值时遇到了麻烦。我需要将它们全部排列起来,包括空白的。我该怎么做?
I got an HTML document with pretty simple table like this
<table>
<tr><th>Country</th><th>Date</th></tr>
<tr>
<td><b><a href="/calendar/?region=BE">Belgium</a></b></td>
<td align="right"><a href="/date/04-20/">20 April</a> <a href="/year/2001/">2001</a></td>
<td>(original release)</td>
</tr>
<tr>
<td><b><a href="/calendar/?region=BE">Belgium</a></b></td>
<td align="right"><a href="/date/04-25/">25 April</a> <a href="/year/2001/">2001</a></td>
<td></td>
</tr>
<tr>
<td><b><a href="/calendar/?region=FR">France</a></b></td>
<td align="right"><a href="/date/04-27/">27 April</a> <a href="/year/2001/">2001</a></td>
<td></td>
</tr>
<tr>
<td><b><a href="/calendar/?region=CH">Switzerland</a></b></td>
<td align="right"><a href="/date/05-25/">25 May</a> <a href="/year/2001/">2001</a></td>
<td>(French speaking region)</td>
</tr>
<tr>
<td><b><a href="/calendar/?region=CZ">Czech Republic</a></b></td>
<td align="right"><a href="/date/07-06/">6 July</a> <a href="/year/2001/">2001</a></td>
<td>(International Film Festival)</td>
</tr>
</table>
First two columns are easy to parse:
document.search("a[@href*=calendar]").each { |country| countries << country.inner_text }
document.search("td[@align*=right]").each { |date| dates << date.inner_text }
But I have troubles looking up values from 3rd column. I need all of them in array, including the blank ones. How can I do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
回答我自己的问题:
Answering my own question: