BS4：与同一类分开文字-Python

发布于 2025-02-01 08:23:27 字数 784 浏览 3 评论 0原文

我是第一次网络刮擦，并遇到一个问题：有些课程具有相同的名称。

这是代码：

testlink = 'https://www.ah.nl/producten/product/wi387906/wasa-volkoren'

r = requests.get(testlink)
soup = BeautifulSoup(r.content, 'html.parser')

products = (soup.findAll('dd', class_='product-info-definition-list_value__kspp6'))

这是我需要获得第三类（Rogge，Glutenbevattende Granen）的输出

[<dd class="product-info-definition-list_value__kspp6">13 g</dd>, <dd class="product-info-definition-list_value__kspp6">20</dd>, <dd class="product-info-definition-list_value__kspp6">Rogge, Glutenbevattende Granen</dd>, <dd class="product-info-definition-list_value__kspp6">Sesamzaad, Melk</dd>]

...我正在使用此链接进行测试，并最终想刮擦网站的多个页面。有人提示吗？

谢谢你！

原文

I am web scraping for the first time, and ran into a problem: some classes have the same name.

This is the code:

testlink = 'https://www.ah.nl/producten/product/wi387906/wasa-volkoren'

r = requests.get(testlink)
soup = BeautifulSoup(r.content, 'html.parser')

products = (soup.findAll('dd', class_='product-info-definition-list_value__kspp6'))

And this is the output

[<dd class="product-info-definition-list_value__kspp6">13 g</dd>, <dd class="product-info-definition-list_value__kspp6">20</dd>, <dd class="product-info-definition-list_value__kspp6">Rogge, Glutenbevattende Granen</dd>, <dd class="product-info-definition-list_value__kspp6">Sesamzaad, Melk</dd>]

I need to get the 3rd class (Rogge, Glutenbevattende Granen)... I am using this link to test, and eventually want to scrape multiple pages of the website. Anyone any tips?

Thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如此安好 2025-02-08 08:23:27

您可以选择具有类值的所有DD标签

import requests
from bs4 import BeautifulSoup
url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,11):
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for link in soup.select('div[class="product-card-portrait_content__2xN-b"] a'):
        abs_url = 'https://www.ah.nl' + link.get('href')
        #print(abs_url)
        

        req2 = requests.get(abs_url)
        soup2 = BeautifulSoup(req2.content, 'html.parser')
        dd = [d.get_text() for d in soup2.select('dd[class="product-info-definition-list_value__kspp6"]')][2:-2]
        print(dd)

You can select all of dd tags with class value product-info-definition-list_value__kspp6 and list slicing

import requests
from bs4 import BeautifulSoup
url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,11):
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for link in soup.select('div[class="product-card-portrait_content__2xN-b"] a'):
        abs_url = 'https://www.ah.nl' + link.get('href')
        #print(abs_url)
        

        req2 = requests.get(abs_url)
        soup2 = BeautifulSoup(req2.content, 'html.parser')
        dd = [d.get_text() for d in soup2.select('dd[class="product-info-definition-list_value__kspp6"]')][2:-2]
        print(dd)

回复收藏 0 原文

~没有更多了~