使用 Python 和 BeautifulSoup 解析表
我正在尝试使用 Python 和 BeautifulSoup 访问某些 td 标签中的内容。我可以获取第一个符合条件的 td 标签(使用 find),也可以获取所有符合条件的 td 标签(使用 findAll)。
现在,我可以使用 findAll,获取所有内容,并从中获取我想要的内容,但这似乎效率很低(即使我对搜索设置了限制)。有没有办法转到某个符合我想要的标准的 td 标签?说第三个,还是第十个?
到目前为止,这是我的代码:
from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})
for x in range(len(td)):
var1 = td[x]
var2 = var1.contents[0]
print(var2)
I am trying to access content in certain td tags with Python and BeautifulSoup. I can either get the first td tag meeting the criteria (with find), or all of them (with findAll).
Now, I could just use findAll, get them all, and get the content I want out of them, but that seems like it is inefficient (even if I put limits on the search). Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?
Here's my code so far:
from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})
for x in range(len(td)):
var1 = td[x]
var2 = var1.contents[0]
print(var2)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
嗯
……没有别的办法了。
Well...
...there is no other way..
find
和findAll
非常灵活,BeautifulSoup.findAll文档说find
andfindAll
are very flexible, the BeautifulSoup.findAll docs say