美丽的汤和桌子
你好,我正在尝试使用 Beautiful Soup 解析 html 表。 该表看起来像这样:
<table width=100% border=1 cellpadding=0 cellspacing=0 bgcolor=#e0e0cc>
<tr>
<td width=12% height=1 align=center valign=middle bgcolor=#e0e0cc bordercolorlight=#000000 bordercolordark=white> <b><font face="Verdana" size=1><a href="http://www.dailystocks.com/" alt="DailyStocks.com" title="Home">Home</a></font></b></td>
</tr>
</table>
<table width="100%" border="0" cellpadding="1" cellspacing="1">
<tr class="odd"><td class="left"><a href="whatever">ABX</a></td><td class="left">Barrick Gold Corp.</td><td>55.95</td><td>55.18</td><td class="up">+0.70</td><td>11040601</td><td>70.28%</td><td><center> <a href="whatever" class="bcQLink"> Q </a> <a href="chart.asp?sym=ABX&code=XDAILY" class="bcQLink"> C </a> <a href="texpert.asp?sym=ABX&code=XDAILY" class="bcQLink"> O </a> </center></td></tr>
</table>
我想从第二个表中获取信息,到目前为止我尝试了这段代码:
html = file("whatever.html")
soup = BeautifulSoup(html)
t = soup.find(id='table')
dat = [ map(str, row.findAll("td")) for row in t.findAll("tr") ]
这似乎不起作用,任何帮助将不胜感激, 谢谢
Hi I'm trying to parse an html table using Beautiful Soup.
The table looks something like this:
<table width=100% border=1 cellpadding=0 cellspacing=0 bgcolor=#e0e0cc>
<tr>
<td width=12% height=1 align=center valign=middle bgcolor=#e0e0cc bordercolorlight=#000000 bordercolordark=white> <b><font face="Verdana" size=1><a href="http://www.dailystocks.com/" alt="DailyStocks.com" title="Home">Home</a></font></b></td>
</tr>
</table>
<table width="100%" border="0" cellpadding="1" cellspacing="1">
<tr class="odd"><td class="left"><a href="whatever">ABX</a></td><td class="left">Barrick Gold Corp.</td><td>55.95</td><td>55.18</td><td class="up">+0.70</td><td>11040601</td><td>70.28%</td><td><center> <a href="whatever" class="bcQLink"> Q </a> <a href="chart.asp?sym=ABX&code=XDAILY" class="bcQLink"> C </a> <a href="texpert.asp?sym=ABX&code=XDAILY" class="bcQLink"> O </a> </center></td></tr>
</table>
I would like to get the information from the second table, and so far I tried this code:
html = file("whatever.html")
soup = BeautifulSoup(html)
t = soup.find(id='table')
dat = [ map(str, row.findAll("td")) for row in t.findAll("tr") ]
That doesnt seem to work, any help would be much appreciated,
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
第一个问题是这个语句:“t=soup.find(id='table')”没有任何表的 id。我认为你的意思是“t=soup.find('table')”,它找到一个表。不幸的是它只找到第一个表。
您可以执行“t=soup.findAll(table)[1]”,但这会非常脆弱。
我建议如下:
生成的 dat 变量是:
编辑:错误的数组索引
The first problem is with this statement: "t=soup.find(id='table')" There is nothing with an id of table. I think what you mean is "t=soup.find('table')" this finds a table. Unfortunately it only finds the first table.
You could do "t=soup.findAll(table)[1]" but this would be quite brittle.
I would suggest something like the following:
The resulting dat variable is:
Edit: wrong array index