使用Python BeautifulSoup获取在线新闻文章的评论数

发布于 2024-12-01 06:26:49 字数 907 浏览 1 评论 0原文

我正在使用以下代码：

import urllib
from BeautifulSoup import BeautifulSoup
import re 

comment_url = http://community.nytimes.com/comments/www.nytimes.com/2011/08/24/world/africa/24libya.html

response_new = urllib.urlopen(comment_url)
html_new = response.read()
soup_new = BeautifulSoup(html_new)
tags = soup_new.findAll('h3', {'class': 'share'})
for tag in tags:
    a = tag.renderContents()
    print a 

print "done!"

我试图通过使用 BeautifulSoup 解析器查找某些标签内的信息来获取读者对《纽约时报》某篇文章发表的评论数量。在标准的《纽约时报》文章社区页面上，信息的位置如下：

<p>Share your thoughts.</p> 
</div> 
<div id="commentsWell"> 
<div id="readerComments"> 
<div class="header clearfix"> 
<h3 class="share">185
 Readers' Comments</h3>

但是，当我运行代码时，我只是得到“完成！”这个词。很明显，我的代码没有获取我指定的任何标签。我的问题是 - 我是否错误地使用了 BeautifulSoup？如果是这样，您建议我如何修改代码以获得所需的信息？

谢谢斯内哈

原文

I'm using the following code:

import urllib
from BeautifulSoup import BeautifulSoup
import re 

comment_url = http://community.nytimes.com/comments/www.nytimes.com/2011/08/24/world/africa/24libya.html

response_new = urllib.urlopen(comment_url)
html_new = response.read()
soup_new = BeautifulSoup(html_new)
tags = soup_new.findAll('h3', {'class': 'share'})
for tag in tags:
    a = tag.renderContents()
    print a 

print "done!"

I am trying to obtain the number of comments readers have made on a certain New York Times Article by using the BeautifulSoup parser to look for information within certain tags. On a standard NYTimes article community page, the information is located like this:

<p>Share your thoughts.</p> 
</div> 
<div id="commentsWell"> 
<div id="readerComments"> 
<div class="header clearfix"> 
<h3 class="share">185
 Readers' Comments</h3>

However, when I run the code, I simply get the word "done!". It is apparent that my code isn't picking up any tags that I have specified. My question is - am I using BeautifulSoup incorrectly? If so, how would you suggest that I amend my code so as to get the desired information?

Thanks
Sneha

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

枕花眠 2024-12-08 06:26:49

显式使用 attrs 关键字参数：

tags = soup_new.findAll('h3', attrs={'class': 'share'})

findAll 的调用签名为：

soup_new.findAll(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

因此，当您省略 attrs= 时，您将分配第二个参数 < code>{'class': 'share'}，到 name 而不是 attrs。

Use the attrs keyword parameter explicitly:

tags = soup_new.findAll('h3', attrs={'class': 'share'})

The call signature for findAll is:

soup_new.findAll(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

so when you omit attrs=, you are assigning the second argument, {'class': 'share'}, to name rather than attrs.

回复收藏 0 原文

~没有更多了~

关于作者

笙痞

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

使用Python BeautifulSoup获取在线新闻文章的评论数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

使用Python BeautifulSoup获取在线新闻文章的评论数

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

游缘惊梦

小兔几

Glik

生生漫

Luxian

Champion-Ming

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。