使用美丽的汤在Div中获取物品列表

发布于 2025-02-13 08:38:01 字数 445 浏览 1 评论 0原文

我正在尝试使用美丽的汤来从a

from bs4 import BeautifulSoup
page = requests.get("https://www.udacity.com/courses/all?price=Free",verify=False)
soup = BeautifulSoup(page.content, 'html.parser')
ls=soup.find_all('ul', {'class': 'catalog-v2_results__1FjDi'})
print(ls)

I am trying to use beautiful soup to pull a list of courses from a website but have little success. I have attached the HTML structure of the website below. I am trying to pull the list of elements from <ul class="catalog-v2_results__1FjDi"> class. The below code returns nothing. I have very little familiarity with HTML and trying to find the easiest way of doing this.

from bs4 import BeautifulSoup
page = requests.get("https://www.udacity.com/courses/all?price=Free",verify=False)
soup = BeautifulSoup(page.content, 'html.parser')
ls=soup.find_all('ul', {'class': 'catalog-v2_results__1FjDi'})
print(ls)

HTML Structure

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

知足的幸福 2025-02-20 08:38:01

该网页是动态的,BS4不能呈现JavaScript,但是可以使用硒的BS4模仿。我使用CSS选择器来解析HTML DOM元素。

示例:

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
data=[]  
driver.get('https://www.udacity.com/courses/all?price=Free')
time.sleep(5)
driver.maximize_window()
time.sleep(3)
 
soup = BeautifulSoup(driver.page_source, 'lxml')
    
for course in soup.select('.catalog-v2_results__1FjDi > li'):
    title= course.select_one('.card_title__35G97').text
    data.append({
        'title':title
        })
df=pd.DataFrame(data)
print(df)

输出:

                        title
0                   Intro to Data Analysis
1                    SQL for Data Analysis
2       Database Systems Concepts & Design
3          Intro to Inferential Statistics
4                                    Spark
..                                     ...
186               Front-End Interview Prep
187              Full-Stack Interview Prep
188  Data Structures & Algorithms in Swift
189                     iOS Interview Prep
190                      VR Interview Prep

[191 rows x 1 columns]

The webpage is dynamic and bs4 can't render JavaScript but can mimic using bs4 with selenium. I use CSS selectors to parse the html DOM elements.

Example:

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
data=[]  
driver.get('https://www.udacity.com/courses/all?price=Free')
time.sleep(5)
driver.maximize_window()
time.sleep(3)
 
soup = BeautifulSoup(driver.page_source, 'lxml')
    
for course in soup.select('.catalog-v2_results__1FjDi > li'):
    title= course.select_one('.card_title__35G97').text
    data.append({
        'title':title
        })
df=pd.DataFrame(data)
print(df)

Output:

                        title
0                   Intro to Data Analysis
1                    SQL for Data Analysis
2       Database Systems Concepts & Design
3          Intro to Inferential Statistics
4                                    Spark
..                                     ...
186               Front-End Interview Prep
187              Full-Stack Interview Prep
188  Data Structures & Algorithms in Swift
189                     iOS Interview Prep
190                      VR Interview Prep

[191 rows x 1 columns]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文