当前位置：文江博客话题详情

如果对象还有其他类，Beautiful Soup 也找不到 CSS 类

发布于 2024-07-30 07:49:04 字数 281 浏览 6 评论 0原文

如果页面具有

和

，则 soup.findAll(True, 'class1') 将找到它们。

但是，如果它具有

，则不会找到它。如何找到具有特定类的所有对象，无论它们是否也有其他类？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

﹉夏雨初晴づ 2024-08-06 07:49:04

搜索具有特定 CSS 类的标签非常有用，但 CSS 属性的名称“class”是 Python 中的保留字。使用 class 作为关键字参数会给你带来语法错误。从 Beautiful Soup 4.1.2 开始，您可以使用关键字参数 class_ 按 CSS 类搜索：

Like:

soup.find_all("a", class_="class1")

It’s very useful to search for a tag that has a certain CSS class, but the name of the CSS attribute, “class”, is a reserved word in Python. Using class as a keyword argument will give you a syntax error. As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_:

Like:

soup.find_all("a", class_="class1")

回复收藏 0 原文

海未深 2024-08-06 07:49:04

您应该使用 lxml。它适用于由空格分隔的多个类值（“class1 class2”）。

尽管名称如此，lxml 也可用于解析和抓取 HTML。它比 BeautifulSoup 快得多，甚至比 BeautifulSoup（他们声名鹊起）更好地处理“损坏的”HTML。如果您不想学习 lxml API，它也有一个 BeautifulSoup 的兼容性 API。

Ian Bicking 同意并且更喜欢通过 BeautifulSoup 进行 lxml。

没有理由再使用 BeautifulSoup，除非你使用的是 Google App Engine 或其他不允许使用非纯 Python 的东西。

您甚至可以将 CSS 选择器与 lxml 一起使用，因此它比 BeautifulSoup 更容易使用。尝试在交互式 Python 控制台中使用它。

回复收藏 0 原文

澜川若宁 2024-08-06 07:49:04

以防万一有人遇到这个问题。 BeautifulSoup 现在支持这一点：

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

In [1]: import bs4

In [2]: soup = bs4.BeautifulSoup('<div class="foo bar"></div>')

In [3]: soup(attrs={'class': 'bar'})
Out[3]: [<div class="foo bar"></div>]

此外，您不必再键入 findAll。

Just in case anybody comes across this question. BeautifulSoup now supports this:

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

In [1]: import bs4

In [2]: soup = bs4.BeautifulSoup('<div class="foo bar"></div>')

In [3]: soup(attrs={'class': 'bar'})
Out[3]: [<div class="foo bar"></div>]

Also, you don't have to type findAll anymore.

回复收藏 0 原文

蓦然回首 2024-08-06 07:49:04

不幸的是，BeautifulSoup 将其视为一个带有空格的类 'class1 class2'，而不是两个类 ['class1','class2']。解决方法是使用正则表达式而不是字符串来搜索类。

这有效：

soup.findAll(True, {'class': re.compile(r'\bclass1\b')})

Unfortunately, BeautifulSoup treats this as a class with a space in it 'class1 class2' rather than two classes ['class1','class2']. A workaround is to use a regular expression to search for the class instead of a string.

This works:

soup.findAll(True, {'class': re.compile(r'\bclass1\b')})

回复收藏 0 原文

~没有更多了~

关于作者

匿名的好友

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

如果对象还有其他类，Beautiful Soup 也找不到 CSS 类

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

巷子口的你

微信用户

神妖

鞋纸虽美，但不合脚ㄋ〞

7460852697

ligengkai

友情链接

如果对象还有其他类，Beautiful Soup 也找不到 CSS 类

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

巷子口的你

微信用户

神妖

鞋纸虽美，但不合脚ㄋ〞

7460852697

ligengkai

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。