beautifulsoup 解析时出现问题
我正在尝试解析以下网页链接。 下面的代码:
import urllib2
import sys
from BeautifulSoup import BeautifulSoup
url = 'http://www.etsy.com/teams/list'
source = urllib2.urlopen(url)
soup = BeautifulSoup(source)
print soup.prettify()
print len(soup('h3')) #to print the no of occurances of h3
h3s = soup.findAll('h3') #finding the same as above
print len(h3s)
问题是,它打印 1. 而网页包含至少 10 个“h3”。我无法弄清楚问题出在哪里 我正在使用 python 2.7 和 BeautifulSoup 3.0.7
I'm trying to parse the following web page link.
Code below:
import urllib2
import sys
from BeautifulSoup import BeautifulSoup
url = 'http://www.etsy.com/teams/list'
source = urllib2.urlopen(url)
soup = BeautifulSoup(source)
print soup.prettify()
print len(soup('h3')) #to print the no of occurances of h3
h3s = soup.findAll('h3') #finding the same as above
print len(h3s)
The problem is, it prints 1. while the web page contains atleast 10 'h3'.I couldn't figure out where the problem lies
I am using python 2.7 and BeautifulSoup 3.0.7
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议使用
lxml
代替:I'd recommend using
lxml
instead: