Web刮擦在Python中赢得了整个列表
我是从Python开始的,在Python进行网络刮擦时,它不会显示整个列表,我将留在那里,我试图将A24的A24电影排在IMDB中
from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests
try:
source =requests.get('https://www.imdb.com/list/ls024372673/')
source.raise_for_status()
soup=BeautifulSoup(source.text,'html.parser')
movies=soup.find('div',class_="lister-list").find_all('div')
for movie in movies :
name= movie.find('h3',class_="lister-item-header").a.text
rank= movie.find('span',class_="lister-item-index unbold text-primary").text
year= movie.find('span',class_="lister-item-year text-muted unbold").text
star= movie.find('span',class_="ipl-rating-star__rating").text
metascore= movie.find('div',class_="inline-block ratings-metascore").span.text
score=movie.find('div',class_="list-description").text
genre=movie.find('span',class_="genre").text
runtime=movie.find('span',class_="runtime").text
about=movie.find('p',class_="").text
elements = movie.findAll('span', attrs = {'name':'nv'})
votes = elements[0]['data-value']
gross = elements[1]['data-value']
print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
print(e)
I am starting out in python and when doing a web scraping in python it won't show the whole list I will leave the code there, I was trying to pull the A24 films ranked in IMDB
from cmath import e
from pydoc import synopsis
from bs4 import BeautifulSoup
import requests
try:
source =requests.get('https://www.imdb.com/list/ls024372673/')
source.raise_for_status()
soup=BeautifulSoup(source.text,'html.parser')
movies=soup.find('div',class_="lister-list").find_all('div')
for movie in movies :
name= movie.find('h3',class_="lister-item-header").a.text
rank= movie.find('span',class_="lister-item-index unbold text-primary").text
year= movie.find('span',class_="lister-item-year text-muted unbold").text
star= movie.find('span',class_="ipl-rating-star__rating").text
metascore= movie.find('div',class_="inline-block ratings-metascore").span.text
score=movie.find('div',class_="list-description").text
genre=movie.find('span',class_="genre").text
runtime=movie.find('span',class_="runtime").text
about=movie.find('p',class_="").text
elements = movie.findAll('span', attrs = {'name':'nv'})
votes = elements[0]['data-value']
gross = elements[1]['data-value']
print(name,rank,year,star,metascore,score,genre,runtime,about,votes,gross)
except Exception as e:
print(e)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您最好检查
尝试/除
块中发生的情况,并处理异常,例如如果语句
:示例
您还可以使用更结构化的方法来保持结果:
You better should check what happens in your
try / except
blocks and handle exceptions e.g. withif statements
:Example
You also could use a more structured way to hold your results:
电影
这不是列表。您正在使用.find()
返回第一个找到元素。您必须使用.find_all()
返回列表。另外,您正在寻找元素内部的所有项目,其中
class =“ lister-list”
,但是这样,您只会得到一个元素,而不是电影列表。您应该使用class =“ lister-item-content”
搜索所有元素。另一个问题是
得分
变量。电影元素中没有div
class =“ list-description” 。您将获得一个错误,因为它将返回没有属性文本
的nonepe对象
。我还添加了一个.strip()
以删除空格。编辑:我同意 hedgehog 。他的示例是这种代码结构的完美解决方案。只需记住添加
.strip()
即可。movies
it's not a list. You are using.find()
that return the first found element. You have to use instead.find_all()
which return a list.Also you are looking for all the items inside the element with
class="lister-list"
, but in this way you will get only one element, not a list of movies. You should search for all the elements withclass="lister-item-content"
.An other problem is the
score
variable. There is nodiv
withclass="list-description"
inside your movie element. You will get an error because it will return aNoneType object
that have no attributetext
. I have also added a.strip()
to remove the spaces.Edit: I agree with HedgeHog. His example is a perfect solution for this type of code structure. Just remember adding the
.strip()
.