Web刮擦在Python中赢得了整个列表

发布于 2025-02-10 15:47:56 字数 1361 浏览 1 评论 0原文

我是从Python开始的，在Python进行网络刮擦时，它不会显示整个列表，我将留在那里，我试图将A24的A24电影排在IMDB中

from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests


try:
    source =requests.get('https://www.imdb.com/list/ls024372673/')
    source.raise_for_status()  

    soup=BeautifulSoup(source.text,'html.parser')
    movies=soup.find('div',class_="lister-list").find_all('div')
   
    for movie in movies :
        name= movie.find('h3',class_="lister-item-header").a.text

        rank= movie.find('span',class_="lister-item-index unbold text-primary").text
        
        year= movie.find('span',class_="lister-item-year text-muted unbold").text

        star= movie.find('span',class_="ipl-rating-star__rating").text
        
        metascore= movie.find('div',class_="inline-block ratings-metascore").span.text

        score=movie.find('div',class_="list-description").text

        genre=movie.find('span',class_="genre").text
        
        runtime=movie.find('span',class_="runtime").text

        about=movie.find('p',class_="").text
       
        elements = movie.findAll('span', attrs = {'name':'nv'})
        votes = elements[0]['data-value']
        gross = elements[1]['data-value']

    print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
         print(e)

原文

I am starting out in python and when doing a web scraping in python it won't show the whole list I will leave the code there, I was trying to pull the A24 films ranked in IMDB

from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests


try:
    source =requests.get('https://www.imdb.com/list/ls024372673/')
    source.raise_for_status()  

    soup=BeautifulSoup(source.text,'html.parser')
    movies=soup.find('div',class_="lister-list").find_all('div')
   
    for movie in movies :
        name= movie.find('h3',class_="lister-item-header").a.text

        rank= movie.find('span',class_="lister-item-index unbold text-primary").text
        
        year= movie.find('span',class_="lister-item-year text-muted unbold").text

        star= movie.find('span',class_="ipl-rating-star__rating").text
        
        metascore= movie.find('div',class_="inline-block ratings-metascore").span.text

        score=movie.find('div',class_="list-description").text

        genre=movie.find('span',class_="genre").text
        
        runtime=movie.find('span',class_="runtime").text

        about=movie.find('p',class_="").text
       
        elements = movie.findAll('span', attrs = {'name':'nv'})
        votes = elements[0]['data-value']
        gross = elements[1]['data-value']

    print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
         print(e)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

活泼老夫 2025-02-17 15:47:56

您最好检查尝试/除块中发生的情况，并处理异常，例如如果语句：

'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,

示例

您还可以使用更结构化的方法来保持结果：

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get('https://www.imdb.com/list/ls024372673/', headers=headers)
soup = BeautifulSoup(page.content)

data = []
for movie in soup.select('.lister-item'):
    data.append({
        'name': movie.find('h3',class_="lister-item-header").a.text,
        'rank': movie.find('span',class_="lister-item-index unbold text-primary").text,
        'year': movie.find('span',class_="lister-item-year text-muted unbold").text,
        'star': movie.find('span',class_="ipl-rating-star__rating").text,
        'metascore': movie.find('div',class_="inline-block ratings-metascore").span.text,
        'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
        'genre': movie.find('span',class_="genre").text.strip(),
        'runtime': movie.find('span',class_="runtime").text,
        'about': movie.find('p',class_="").text,
        'elements': movie.find_all('span', attrs = {'name':'nv'}),
        'votes': elements[0]['data-value'],
        'gross': elements[1]['data-value']
    })
data

You better should check what happens in your try / except blocks and handle exceptions e.g. with if statements:

'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,

Example

You also could use a more structured way to hold your results:

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}
page = requests.get('https://www.imdb.com/list/ls024372673/', headers=headers)
soup = BeautifulSoup(page.content)

data = []
for movie in soup.select('.lister-item'):
    data.append({
        'name': movie.find('h3',class_="lister-item-header").a.text,
        'rank': movie.find('span',class_="lister-item-index unbold text-primary").text,
        'year': movie.find('span',class_="lister-item-year text-muted unbold").text,
        'star': movie.find('span',class_="ipl-rating-star__rating").text,
        'metascore': movie.find('div',class_="inline-block ratings-metascore").span.text,
        'score': movie.find('div',class_="list-description").text if movie.find('div',class_="list-description") else None,
        'genre': movie.find('span',class_="genre").text.strip(),
        'runtime': movie.find('span',class_="runtime").text,
        'about': movie.find('p',class_="").text,
        'elements': movie.find_all('span', attrs = {'name':'nv'}),
        'votes': elements[0]['data-value'],
        'gross': elements[1]['data-value']
    })
data

回复收藏 0 原文

╭⌒浅淡时光〆 2025-02-17 15:47:56

电影这不是列表。您正在使用.find（）返回第一个找到元素。您必须使用.find_all（）返回列表。

另外，您正在寻找元素内部的所有项目，其中class =“ lister-list”，但是这样，您只会得到一个元素，而不是电影列表。您应该使用class =“ lister-item-content”搜索所有元素。

source = requests.get("https://www.imdb.com/list/ls024372673/")
source.raise_for_status()  

soup = BeautifulSoup(source.text, "html.parser")
movies = soup.find_all("div", class_="lister-item-content")

for movie in movies:
    name      = (movie.find("h3", class_="lister-item-header").find("a").text).strip()
    rank      = (movie.find("span", class_="lister-item-index unbold text-primary").text).strip()
    year      = (movie.find("span", class_="lister-item-year text-muted unbold").text).strip()
    stars     = (movie.find("span", class_="ipl-rating-star__rating").text).strip()
    metascore = (movie.find("div", class_="inline-block ratings-metascore").find("span").text).strip()
    # score   = movie.find("div", class_="list-description").text // There isn't this class inside movie
    genre     = (movie.find("span", class_="genre").text).strip()
    runtime   = (movie.find("span", class_="runtime").text).strip()
    about     = (movie.find("p", class_="").text).strip()

    elements = movie.findAll("span", attrs = {"name":"nv"})
    votes    = elements[0]['data-value']
    gross    = elements[1]['data-value']

另一个问题是得分变量。电影元素中没有div class =“ list-description” 。您将获得一个错误，因为它将返回没有属性文本的nonepe对象。我还添加了一个.strip（）以删除空格。

编辑：我同意 hedgehog 。他的示例是这种代码结构的完美解决方案。只需记住添加.strip（）即可。

movies it's not a list. You are using .find() that return the first found element. You have to use instead .find_all() which return a list.

Also you are looking for all the items inside the element with class="lister-list", but in this way you will get only one element, not a list of movies. You should search for all the elements with class="lister-item-content".

source = requests.get("https://www.imdb.com/list/ls024372673/")
source.raise_for_status()  

soup = BeautifulSoup(source.text, "html.parser")
movies = soup.find_all("div", class_="lister-item-content")

for movie in movies:
    name      = (movie.find("h3", class_="lister-item-header").find("a").text).strip()
    rank      = (movie.find("span", class_="lister-item-index unbold text-primary").text).strip()
    year      = (movie.find("span", class_="lister-item-year text-muted unbold").text).strip()
    stars     = (movie.find("span", class_="ipl-rating-star__rating").text).strip()
    metascore = (movie.find("div", class_="inline-block ratings-metascore").find("span").text).strip()
    # score   = movie.find("div", class_="list-description").text // There isn't this class inside movie
    genre     = (movie.find("span", class_="genre").text).strip()
    runtime   = (movie.find("span", class_="runtime").text).strip()
    about     = (movie.find("p", class_="").text).strip()

    elements = movie.findAll("span", attrs = {"name":"nv"})
    votes    = elements[0]['data-value']
    gross    = elements[1]['data-value']

An other problem is the score variable. There is no div with class="list-description" inside your movie element. You will get an error because it will return a NoneType object that have no attribute text. I have also added a .strip() to remove the spaces.

Edit: I agree with HedgeHog. His example is a perfect solution for this type of code structure. Just remember adding the .strip().

回复收藏 0 原文

~没有更多了~