美丽的小组 - 在课堂上刮擦一个条件

发布于 2025-02-07 10:00:03 字数 454 浏览 2 评论 0原文

例如,我有这个html,

<div class="item-1">a</div>
<div class="item-3">b</div>
<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>
...... item-x keep increasing randomly on it class
<div class="item-100">aaaaaa</div>

我想删除所有类item-x,其中x的值在5到10之间,

我知道如何使用部分类名称搜索

text = soup.select('div[class*="item-"]')

,但是我不知道如何为其添加条件

for example I have this html

<div class="item-1">a</div>
<div class="item-3">b</div>
<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>
...... item-x keep increasing randomly on it class
<div class="item-100">aaaaaa</div>

I want to scrap all of the class item-X where the value of X is between 5 to 10

I know how to search with a partial class name

text = soup.select('div[class*="item-"]')

but I don't know how to add conditions for it

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

趁微风不噪 2025-02-14 10:00:03

您可以简单地用于循环。

import bs4 as bs

html = """
<div class="item-1">a</div>
<div class="item-3">b</div>
<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>
<div class="item-100">aaaaaa</div>
"""

soup = bs.BeautifulSoup(html, 'lxml')

for i in range(5, 10):
    text = soup.select('div[class*="item-' + str(i) + '"]')
    if text:
        print(text)

You can simply use for loop.

import bs4 as bs

html = """
<div class="item-1">a</div>
<div class="item-3">b</div>
<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>
<div class="item-100">aaaaaa</div>
"""

soup = bs.BeautifulSoup(html, 'lxml')

for i in range(5, 10):
    text = soup.select('div[class*="item-' + str(i) + '"]')
    if text:
        print(text)
沉默的熊 2025-02-14 10:00:03

您可以使用由加入的多个CSS选择器:

html_doc = """\
<div class="item-1">a</div>
<div class="item-3">b</div>
<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>
<div class="item-100">aaaaaa</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

texts = soup.select(",".join(f"div.item-{i}" for i in range(5, 11)))
for text in texts:
    print(text)

打印:

<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>

You can use multiple CSS selectors joined by ,:

html_doc = """\
<div class="item-1">a</div>
<div class="item-3">b</div>
<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>
<div class="item-100">aaaaaa</div>
"""

soup = BeautifulSoup(html_doc, "html.parser")

texts = soup.select(",".join(f"div.item-{i}" for i in range(5, 11)))
for text in texts:
    print(text)

Prints:

<div class="item-6">c</div>
<div class="item-8">aaaaaa</div>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文