使用 Python 和 BeautifulSoup 解析表

发布于 2024-11-16 12:25:19 字数 727 浏览 0 评论 0原文

我正在尝试使用 Python 和 BeautifulSoup 访问某些 td 标签中的内容。我可以获取第一个符合条件的 td 标签（使用 find），也可以获取所有符合条件的 td 标签（使用 findAll）。

现在，我可以使用 findAll，获取所有内容，并从中获取我想要的内容，但这似乎效率很低（即使我对搜索设置了限制）。有没有办法转到某个符合我想要的标准的 td 标签？说第三个，还是第十个？

到目前为止，这是我的代码：

from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})

for x in range(len(td)):
    var1 = td[x]
    var2 = var1.contents[0]
    print(var2)

原文

I am trying to access content in certain td tags with Python and BeautifulSoup. I can either get the first td tag meeting the criteria (with find), or all of them (with findAll).

Now, I could just use findAll, get them all, and get the content I want out of them, but that seems like it is inefficient (even if I put limits on the search). Is there anyway to go to a certain td tag meeting the criteria I want? Say the third, or the 10th?

Here's my code so far:

from __future__ import division
from __future__ import unicode_literals
from __future__ import print_function
from mechanize import Browser
from BeautifulSoup import BeautifulSoup

br = Browser()
url = "http://finance.yahoo.com/q/ks?s=goog+Key+Statistics"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
td = soup.findAll("td", {'class': 'yfnc_tablehead1'})

for x in range(len(td)):
    var1 = td[x]
    var2 = var1.contents[0]
    print(var2)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

成熟的代价 2024-11-23 12:25:19

有没有办法去某个td
标签符合我想要的标准吗？说
第三个，还是第十个？

嗯

all_tds = [td for td in soup.findAll("td", {'class': 'yfnc_tablehead1'})]

print all_tds[3]

……没有别的办法了。

Is there anyway to go to a certain td
tag meeting the criteria I want? Say
the third, or the 10th?

Well...

all_tds = [td for td in soup.findAll("td", {'class': 'yfnc_tablehead1'})]

print all_tds[3]

...there is no other way..

回复收藏 0 原文

阳光①夏 2024-11-23 12:25:19

find 和 findAll 非常灵活，BeautifulSoup.findAll文档说

5.您可以传入一个可调用对象
它将 Tag 对象作为唯一的
参数，并返回一个布尔值。每一个
findAll遇到的标签对象
将被传递到该对象中，并且
如果调用返回 True 则标记
被认为匹配。

回复收藏 0 原文

~没有更多了~

关于作者

话少情深

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

使用 Python 和 BeautifulSoup 解析表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

missyouangeled

三生一梦

压抑⊿情绪

天涯离梦残月幽梦

指尖微凉心微凉

☆獨立☆

友情链接

使用 Python 和 BeautifulSoup 解析表

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

missyouangeled

三生一梦

压抑⊿情绪

天涯离梦残月幽梦

指尖微凉心微凉

☆獨立☆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。