Web刮擦在Python中赢得了整个列表

发布于 2025-02-10 15:47:56 字数 1361 浏览 1 评论 0原文

我是从Python开始的,在Python进行网络刮擦时,它不会显示整个列表,我将留在那里,我试图将A24的A24电影排在IMDB中

from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests


try:
    source =requests.get('https://www.imdb.com/list/ls024372673/')
    source.raise_for_status()  

    soup=BeautifulSoup(source.text,'html.parser')
    movies=soup.find('div',class_="lister-list").find_all('div')
   
    for movie in movies :
        name= movie.find('h3',class_="lister-item-header").a.text

        rank= movie.find('span',class_="lister-item-index unbold text-primary").text
        
        year= movie.find('span',class_="lister-item-year text-muted unbold").text

        star= movie.find('span',class_="ipl-rating-star__rating").text
        
        metascore= movie.find('div',class_="inline-block ratings-metascore").span.text

        score=movie.find('div',class_="list-description").text

        genre=movie.find('span',class_="genre").text
        
        runtime=movie.find('span',class_="runtime").text

        about=movie.find('p',class_="").text
       
        elements = movie.findAll('span', attrs = {'name':'nv'})
        votes = elements[0]['data-value']
        gross = elements[1]['data-value']

    print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
         print(e) 

I am starting out in python and when doing a web scraping in python it won't show the whole list I will leave the code there, I was trying to pull the A24 films ranked in IMDB

from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests


try:
    source =requests.get('https://www.imdb.com/list/ls024372673/')
    source.raise_for_status()  

    soup=BeautifulSoup(source.text,'html.parser')
    movies=soup.find('div',class_="lister-list").find_all('div')
   
    for movie in movies :
        name= movie.find('h3',class_="lister-item-header").a.text

        rank= movie.find('span',class_="lister-item-index unbold text-primary").text
        
        year= movie.find('span',class_="lister-item-year text-muted unbold").text

        star= movie.find('span',class_="ipl-rating-star__rating").text
        
        metascore= movie.find('div',class_="inline-block ratings-metascore").span.text

        score=movie.find('div',class_="list-description").text

        genre=movie.find('span',class_="genre").text
        
        runtime=movie.find('span',class_="runtime").text

        about=movie.find('p',class_="").text
       
        elements = movie.findAll('span', attrs = {'name':'nv'})
        votes = elements[0]['data-value']
        gross = elements[1]['data-value']

    print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
         print(e) 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

活泼老夫 2025-02-17 15:47:56

您最好检查尝试/除块中发生的情况,并处理异常,例如如果语句

'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
示例

您还可以使用更结构化的方法来保持结果:

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get('https://www.imdb.com/list/ls024372673/', headers=headers)
soup = BeautifulSoup(page.content)

data = []
for movie in soup.select('.lister-item'):
    data.append({
        'name': movie.find('h3',class_="lister-item-header").a.text,
        'rank': movie.find('span',class_="lister-item-index unbold text-primary").text,
        'year': movie.find('span',class_="lister-item-year text-muted unbold").text,
        'star': movie.find('span',class_="ipl-rating-star__rating").text,
        'metascore': movie.find('div',class_="inline-block ratings-metascore").span.text,
        'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
        'genre': movie.find('span',class_="genre").text.strip(),
        'runtime': movie.find('span',class_="runtime").text,
        'about': movie.find('p',class_="").text,
        'elements': movie.find_all('span', attrs = {'name':'nv'}),
        'votes': elements[0]['data-value'],
        'gross': elements[1]['data-value']
    })
data

You better should check what happens in your try / except blocks and handle exceptions e.g. with if statements:

'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
Example

You also could use a more structured way to hold your results:

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get('https://www.imdb.com/list/ls024372673/', headers=headers)
soup = BeautifulSoup(page.content)

data = []
for movie in soup.select('.lister-item'):
    data.append({
        'name': movie.find('h3',class_="lister-item-header").a.text,
        'rank': movie.find('span',class_="lister-item-index unbold text-primary").text,
        'year': movie.find('span',class_="lister-item-year text-muted unbold").text,
        'star': movie.find('span',class_="ipl-rating-star__rating").text,
        'metascore': movie.find('div',class_="inline-block ratings-metascore").span.text,
        'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
        'genre': movie.find('span',class_="genre").text.strip(),
        'runtime': movie.find('span',class_="runtime").text,
        'about': movie.find('p',class_="").text,
        'elements': movie.find_all('span', attrs = {'name':'nv'}),
        'votes': elements[0]['data-value'],
        'gross': elements[1]['data-value']
    })
data
╭⌒浅淡时光〆 2025-02-17 15:47:56

电影这不是列表。您正在使用.find()返回第一个找到元素。您必须使用.find_all()返回列表。

另外,您正在寻找元素内部的所有项目,其中class =“ lister-list”,但是这样,您只会得到一个元素,而不是电影列表。您应该使用class =“ lister-item-content”搜索所有元素。

source = requests.get("https://www.imdb.com/list/ls024372673/")
source.raise_for_status()  

soup = BeautifulSoup(source.text, "html.parser")
movies = soup.find_all("div", class_="lister-item-content")

for movie in movies:
    name      = (movie.find("h3", class_="lister-item-header").find("a").text).strip()
    rank      = (movie.find("span", class_="lister-item-index unbold text-primary").text).strip()
    year      = (movie.find("span", class_="lister-item-year text-muted unbold").text).strip()
    stars     = (movie.find("span", class_="ipl-rating-star__rating").text).strip()
    metascore = (movie.find("div", class_="inline-block ratings-metascore").find("span").text).strip()
    # score   = movie.find("div", class_="list-description").text // There isn't this class inside movie
    genre     = (movie.find("span", class_="genre").text).strip()
    runtime   = (movie.find("span", class_="runtime").text).strip()
    about     = (movie.find("p", class_="").text).strip()

    elements = movie.findAll("span", attrs = {"name":"nv"})
    votes    = elements[0]['data-value']
    gross    = elements[1]['data-value']

另一个问题是得分变量。电影元素中没有div class =“ list-description” 。您将获得一个错误,因为它将返回没有属性文本nonepe对象。我还添加了一个.strip()以删除空格。

编辑:我同意 hedgehog 。他的示例是这种代码结构的完美解决方案。只需记住添加.strip()即可。

movies it's not a list. You are using .find() that return the first found element. You have to use instead .find_all() which return a list.

Also you are looking for all the items inside the element with class="lister-list", but in this way you will get only one element, not a list of movies. You should search for all the elements with class="lister-item-content".

source = requests.get("https://www.imdb.com/list/ls024372673/")
source.raise_for_status()  

soup = BeautifulSoup(source.text, "html.parser")
movies = soup.find_all("div", class_="lister-item-content")

for movie in movies:
    name      = (movie.find("h3", class_="lister-item-header").find("a").text).strip()
    rank      = (movie.find("span", class_="lister-item-index unbold text-primary").text).strip()
    year      = (movie.find("span", class_="lister-item-year text-muted unbold").text).strip()
    stars     = (movie.find("span", class_="ipl-rating-star__rating").text).strip()
    metascore = (movie.find("div", class_="inline-block ratings-metascore").find("span").text).strip()
    # score   = movie.find("div", class_="list-description").text // There isn't this class inside movie
    genre     = (movie.find("span", class_="genre").text).strip()
    runtime   = (movie.find("span", class_="runtime").text).strip()
    about     = (movie.find("p", class_="").text).strip()

    elements = movie.findAll("span", attrs = {"name":"nv"})
    votes    = elements[0]['data-value']
    gross    = elements[1]['data-value']

An other problem is the score variable. There is no div with class="list-description" inside your movie element. You will get an error because it will return a NoneType object that have no attribute text. I have also added a .strip() to remove the spaces.

Edit: I agree with HedgeHog. His example is a perfect solution for this type of code structure. Just remember adding the .strip().

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文