我试图为变量提供路径,以便可以刮擦该路径中包含的信息。但是,我得到空名单
我正在尝试使用Python制作Web Scraper,而我在这里使用的基本概念是
创建空列表 - >使用“ for Loop”循环通过网页上的元素。 - >在空列表中附加该信息 - >使用pandas - >将该列表转换为行和列。终于到了CSV。
我制作的代码是
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
headers = {"Accept-Language": "en-US, en;q=0.5"}
url = "https://www.imdb.com/find?q=top+1000+movies&ref_=nv_sr_sm"
results=requests.get(url,headers=headers)
soup=BeautifulSoup(results.text,"html.parser")
# print(soup.prettify())
#initializing empty lists where the data will go
titles =[]
years = []
times = []
imdb_rating = []
metascores = []
votes = []
us_gross = []
movie_div = soup.find_all('div',class_='lister-list')
#initiating the loop for scraper
for container in movie_div:
#tiles
name=container.tr.td.a.text
titles.append(name)
print(titles)
我想报废的网站是'https://www.imdb.com/chart/top/?ref_=nv_mv_250'。我需要帮助才能知道如何为变量“名称”提供正确的路径,以便我可以在页面的HTML脚本中提取name_of_movei中给出的电影的名称。因为每次我将输出作为空列表。
I am trying to make web scraper using Python and the basic concept I am using here is,
create empty list --> use 'for loop' to loop through the element on the web page. --> append that info in the empty list --> convert that list to row and column using pandas --> finally to a csv.
the code that I made is
import requests
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
headers = {"Accept-Language": "en-US, en;q=0.5"}
url = "https://www.imdb.com/find?q=top+1000+movies&ref_=nv_sr_sm"
results=requests.get(url,headers=headers)
soup=BeautifulSoup(results.text,"html.parser")
# print(soup.prettify())
#initializing empty lists where the data will go
titles =[]
years = []
times = []
imdb_rating = []
metascores = []
votes = []
us_gross = []
movie_div = soup.find_all('div',class_='lister-list')
#initiating the loop for scraper
for container in movie_div:
#tiles
name=container.tr.td.a.text
titles.append(name)
print(titles)
the website I want to scrap is 'https://www.imdb.com/chart/top/?ref_=nv_mv_250'. I need help to know how can i give correct path to the variable 'name', so that i can extract the name of the movie given in name_of_movei, in the HTML script of the page. Because each time I am getting output as empty list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从表中创建一个数据框架:
prints:
name
,Year
,评级
This example will parse
name
,year
,rating
from the table and creates a dataframe from it:Prints: