使用美丽的汤在Div中获取物品列表

发布于 2025-02-13 08:38:01 字数 445 浏览 1 评论 0原文

我正在尝试使用美丽的汤来从a

from bs4 import BeautifulSoup
page = requests.get("https://www.udacity.com/courses/all?price=Free",verify=False)
soup = BeautifulSoup(page.content, 'html.parser')
ls=soup.find_all('ul', {'class': 'catalog-v2_results__1FjDi'})
print(ls)

原文

I am trying to use beautiful soup to pull a list of courses from a website but have little success. I have attached the HTML structure of the website below. I am trying to pull the list of elements from <ul class="catalog-v2_results__1FjDi"> class. The below code returns nothing. I have very little familiarity with HTML and trying to find the easiest way of doing this.

from bs4 import BeautifulSoup
page = requests.get("https://www.udacity.com/courses/all?price=Free",verify=False)
soup = BeautifulSoup(page.content, 'html.parser')
ls=soup.find_all('ul', {'class': 'catalog-v2_results__1FjDi'})
print(ls)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

知足的幸福 2025-02-20 08:38:01

该网页是动态的，BS4不能呈现JavaScript，但是可以使用硒的BS4模仿。我使用CSS选择器来解析HTML DOM元素。

示例：

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
data=[]  
driver.get('https://www.udacity.com/courses/all?price=Free')
time.sleep(5)
driver.maximize_window()
time.sleep(3)
 
soup = BeautifulSoup(driver.page_source, 'lxml')
    
for course in soup.select('.catalog-v2_results__1FjDi > li'):
    title= course.select_one('.card_title__35G97').text
    data.append({
        'title':title
        })
df=pd.DataFrame(data)
print(df)

输出：

                        title
0                   Intro to Data Analysis
1                    SQL for Data Analysis
2       Database Systems Concepts & Design
3          Intro to Inferential Statistics
4                                    Spark
..                                     ...
186               Front-End Interview Prep
187              Full-Stack Interview Prep
188  Data Structures & Algorithms in Swift
189                     iOS Interview Prep
190                      VR Interview Prep

[191 rows x 1 columns]

The webpage is dynamic and bs4 can't render JavaScript but can mimic using bs4 with selenium. I use CSS selectors to parse the html DOM elements.

Example:

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
data=[]  
driver.get('https://www.udacity.com/courses/all?price=Free')
time.sleep(5)
driver.maximize_window()
time.sleep(3)
 
soup = BeautifulSoup(driver.page_source, 'lxml')
    
for course in soup.select('.catalog-v2_results__1FjDi > li'):
    title= course.select_one('.card_title__35G97').text
    data.append({
        'title':title
        })
df=pd.DataFrame(data)
print(df)

Output:

                        title
0                   Intro to Data Analysis
1                    SQL for Data Analysis
2       Database Systems Concepts & Design
3          Intro to Inferential Statistics
4                                    Spark
..                                     ...
186               Front-End Interview Prep
187              Full-Stack Interview Prep
188  Data Structures & Algorithms in Swift
189                     iOS Interview Prep
190                      VR Interview Prep

[191 rows x 1 columns]

回复收藏 0 原文

~没有更多了~