Python Beautifulsoup未能从某个类别的DIV获取数据

发布于 2025-01-31 05:17:20 字数 1789 浏览 3 评论 0原文

我正在制作一个程序，该程序将从图书馆中刮擦Metacritic以获取电影中的信息并显示它，但是在某些部分（例如抓住评分总是什么都没有返回我）我在做什么错？

from bs4 import BeautifulSoup
import requests
import os

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers) 
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("div", {"class": "movie_rating"})
    print(g_data)

    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

def getMovieInfo():
    headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
    
    for movie in os.listdir("D:/Movies/"):
        movie = movie.lower().replace(".mp4","")
        print(movie)
        print("Rating: " + ratingsGet(headers,movie))
        print("Home release year: " + rYearGet(headers,movie))
        break

HTML摘要：

<table class="details" summary="13 Going on 30 Details and Credits">
<tr class="runtime">
<td class="label">Runtime:</td>
<td class="data">98 min</td>
</tr>
<tr class="movie_rating">
<td class="label">Rating:</td>
<td class="data">
                                                                            Rated PG-13 for some sexual content and brief drug references.
                                                                    </td>
</tr>
<tr class="company">
<td class="label">Production:</td>
<td class="data">Revolution Studios</td>
</tr>

原文

I am working on a program that will scrape metacritic for info on the movie from my library and display it but in certain parts like grabbing the rating always returns nothing what am I doing wrong?

from bs4 import BeautifulSoup
import requests
import os

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers) 
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("div", {"class": "movie_rating"})
    print(g_data)

    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

def getMovieInfo():
    headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
    
    for movie in os.listdir("D:/Movies/"):
        movie = movie.lower().replace(".mp4","")
        print(movie)
        print("Rating: " + ratingsGet(headers,movie))
        print("Home release year: " + rYearGet(headers,movie))
        break

html snippet:

<table class="details" summary="13 Going on 30 Details and Credits">
<tr class="runtime">
<td class="label">Runtime:</td>
<td class="data">98 min</td>
</tr>
<tr class="movie_rating">
<td class="label">Rating:</td>
<td class="data">
                                                                            Rated PG-13 for some sexual content and brief drug references.
                                                                    </td>
</tr>
<tr class="company">
<td class="label">Production:</td>
<td class="data">Revolution Studios</td>
</tr>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迎风吟唱 2025-02-07 05:17:20

正如您所说，您需要寻找“ TR”（不是“ Div”）。我还将附加回答。

尝试仅使用查找（无需查找全部），
的结果查找不是一个

g_data.find("td", { "class": "data" }).text

如果属代码将是这样的：

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    g_data = soup.find("tr", {"class": "movie_rating"})

    # Check if that tr exists
    if g_data is not None:
        g_data = g_data.find("td", { "class": "data" })

    # Check if the td inside of it exists
    if g_data is not None:
        return g_data.text.strip()
    return "Failed"

As you said, you need to look for a "tr" (not a "div"). I will also append to the answer this.

Try to use only find (no need of find all)
If the result of find is not None, do another find in it to get only the text, like this:

g_data.find("td", { "class": "data" }).text

The genral code will be something like this:

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    g_data = soup.find("tr", {"class": "movie_rating"})

    # Check if that tr exists
    if g_data is not None:
        g_data = g_data.find("td", { "class": "data" })

    # Check if the td inside of it exists
    if g_data is not None:
        return g_data.text.strip()
    return "Failed"

回复收藏 0 原文

追我者格杀勿论 2025-02-07 05:17:20

我只是在寻找错误的元素。

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("tr", {"class": "movie_rating"})
    print(g_data[0].text.strip(" "))
    
    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

I was just searching for the wrong element....

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("tr", {"class": "movie_rating"})
    print(g_data[0].text.strip(" "))
    
    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

回复收藏 0 原文

~没有更多了~