使用Python BeautifulSoup获取在线新闻文章的评论数
我正在使用以下代码:
import urllib
from BeautifulSoup import BeautifulSoup
import re
comment_url = http://community.nytimes.com/comments/www.nytimes.com/2011/08/24/world/africa/24libya.html
response_new = urllib.urlopen(comment_url)
html_new = response.read()
soup_new = BeautifulSoup(html_new)
tags = soup_new.findAll('h3', {'class': 'share'})
for tag in tags:
a = tag.renderContents()
print a
print "done!"
我试图通过使用 BeautifulSoup 解析器查找某些标签内的信息来获取读者对《纽约时报》某篇文章发表的评论数量。在标准的《纽约时报》文章社区页面上,信息的位置如下:
<p>Share your thoughts.</p>
</div>
<div id="commentsWell">
<div id="readerComments">
<div class="header clearfix">
<h3 class="share">185
Readers' Comments</h3>
但是,当我运行代码时,我只是得到“完成!”这个词。很明显,我的代码没有获取我指定的任何标签。我的问题是 - 我是否错误地使用了 BeautifulSoup?如果是这样,您建议我如何修改代码以获得所需的信息?
谢谢 斯内哈
I'm using the following code:
import urllib
from BeautifulSoup import BeautifulSoup
import re
comment_url = http://community.nytimes.com/comments/www.nytimes.com/2011/08/24/world/africa/24libya.html
response_new = urllib.urlopen(comment_url)
html_new = response.read()
soup_new = BeautifulSoup(html_new)
tags = soup_new.findAll('h3', {'class': 'share'})
for tag in tags:
a = tag.renderContents()
print a
print "done!"
I am trying to obtain the number of comments readers have made on a certain New York Times Article by using the BeautifulSoup parser to look for information within certain tags. On a standard NYTimes article community page, the information is located like this:
<p>Share your thoughts.</p>
</div>
<div id="commentsWell">
<div id="readerComments">
<div class="header clearfix">
<h3 class="share">185
Readers' Comments</h3>
However, when I run the code, I simply get the word "done!". It is apparent that my code isn't picking up any tags that I have specified. My question is - am I using BeautifulSoup incorrectly? If so, how would you suggest that I amend my code so as to get the desired information?
Thanks
Sneha
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
显式使用
attrs
关键字参数:findAll
的调用签名为:因此,当您省略
attrs=
时,您将分配第二个参数 < code>{'class': 'share'},到name
而不是attrs
。Use the
attrs
keyword parameter explicitly:The call signature for
findAll
is:so when you omit
attrs=
, you are assigning the second argument,{'class': 'share'}
, toname
rather thanattrs
.