Python Beautifulsoup未能从某个类别的DIV获取数据
我正在制作一个程序,该程序将从图书馆中刮擦Metacritic以获取电影中的信息并显示它,但是在某些部分(例如抓住评分总是什么都没有返回我)我在做什么错?
from bs4 import BeautifulSoup
import requests
import os
def ratingsGet(headers, movie):
movie = movie.lower().replace(" ","-")
detail_link="https://www.metacritic.com/movie/" + movie + "/details"
detail_page = requests.get(detail_link, headers = headers)
soup = BeautifulSoup(detail_page.content, "html.parser")
#g_data = soup.select('tr.movie_rating td.data span')
g_data = soup.find_all("div", {"class": "movie_rating"})
print(g_data)
if g_data!= []:
return g_data[0].text
else:
return "Failed"
def getMovieInfo():
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
for movie in os.listdir("D:/Movies/"):
movie = movie.lower().replace(".mp4","")
print(movie)
print("Rating: " + ratingsGet(headers,movie))
print("Home release year: " + rYearGet(headers,movie))
break
HTML摘要:
<table class="details" summary="13 Going on 30 Details and Credits">
<tr class="runtime">
<td class="label">Runtime:</td>
<td class="data">98 min</td>
</tr>
<tr class="movie_rating">
<td class="label">Rating:</td>
<td class="data">
Rated PG-13 for some sexual content and brief drug references.
</td>
</tr>
<tr class="company">
<td class="label">Production:</td>
<td class="data">Revolution Studios</td>
</tr>
I am working on a program that will scrape metacritic for info on the movie from my library and display it but in certain parts like grabbing the rating always returns nothing what am I doing wrong?
from bs4 import BeautifulSoup
import requests
import os
def ratingsGet(headers, movie):
movie = movie.lower().replace(" ","-")
detail_link="https://www.metacritic.com/movie/" + movie + "/details"
detail_page = requests.get(detail_link, headers = headers)
soup = BeautifulSoup(detail_page.content, "html.parser")
#g_data = soup.select('tr.movie_rating td.data span')
g_data = soup.find_all("div", {"class": "movie_rating"})
print(g_data)
if g_data!= []:
return g_data[0].text
else:
return "Failed"
def getMovieInfo():
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
for movie in os.listdir("D:/Movies/"):
movie = movie.lower().replace(".mp4","")
print(movie)
print("Rating: " + ratingsGet(headers,movie))
print("Home release year: " + rYearGet(headers,movie))
break
html snippet:
<table class="details" summary="13 Going on 30 Details and Credits">
<tr class="runtime">
<td class="label">Runtime:</td>
<td class="data">98 min</td>
</tr>
<tr class="movie_rating">
<td class="label">Rating:</td>
<td class="data">
Rated PG-13 for some sexual content and brief drug references.
</td>
</tr>
<tr class="company">
<td class="label">Production:</td>
<td class="data">Revolution Studios</td>
</tr>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如您所说,您需要寻找“ TR”(不是“ Div”)。我还将附加回答。
查找
(无需查找全部),如果
属代码将是这样的:
As you said, you need to look for a "tr" (not a "div"). I will also append to the answer this.
find
(no need of find all)find
is not None, do another find in it to get only the text, like this:The genral code will be something like this:
我只是在寻找错误的元素。
I was just searching for the wrong element....