我试图为变量提供路径，以便可以刮擦该路径中包含的信息。但是，我得到空名单

发布于 2025-02-13 08:50:12 字数 1021 浏览 1 评论 0原文

我正在尝试使用Python制作Web Scraper，而我在这里使用的基本概念是

创建空列表 - ＆gt;使用“ for Loop”循环通过网页上的元素。 - ＆gt;在空列表中附加该信息 - ＆gt;使用pandas - ＆gt;将该列表转换为行和列。终于到了CSV。

我制作的代码是

import requests 
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

headers = {"Accept-Language": "en-US, en;q=0.5"}
url = "https://www.imdb.com/find?q=top+1000+movies&ref_=nv_sr_sm"
results=requests.get(url,headers=headers)
soup=BeautifulSoup(results.text,"html.parser")
# print(soup.prettify())

#initializing empty lists where the data will go
titles =[]
years = []
times = []
imdb_rating = []
metascores = []
votes = []
us_gross = []
movie_div = soup.find_all('div',class_='lister-list')

#initiating the loop for scraper 
for container in movie_div:
    #tiles 
    name=container.tr.td.a.text
    titles.append(name)
print(titles)

我想报废的网站是'https://www.imdb.com/chart/top/?ref_=nv_mv_250'。我需要帮助才能知道如何为变量“名称”提供正确的路径，以便我可以在页面的HTML脚本中提取name_of_movei中给出的电影的名称。因为每次我将输出作为空列表。

原文

I am trying to make web scraper using Python and the basic concept I am using here is,

create empty list --> use 'for loop' to loop through the element on the web page. --> append that info in the empty list --> convert that list to row and column using pandas --> finally to a csv.

the code that I made is

import requests 
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

headers = {"Accept-Language": "en-US, en;q=0.5"}
url = "https://www.imdb.com/find?q=top+1000+movies&ref_=nv_sr_sm"
results=requests.get(url,headers=headers)
soup=BeautifulSoup(results.text,"html.parser")
# print(soup.prettify())

#initializing empty lists where the data will go
titles =[]
years = []
times = []
imdb_rating = []
metascores = []
votes = []
us_gross = []
movie_div = soup.find_all('div',class_='lister-list')

#initiating the loop for scraper 
for container in movie_div:
    #tiles 
    name=container.tr.td.a.text
    titles.append(name)
print(titles)

the website I want to scrap is 'https://www.imdb.com/chart/top/?ref_=nv_mv_250'. I need help to know how can i give correct path to the variable 'name', so that i can extract the name of the movie given in name_of_movei, in the HTML script of the page. Because each time I am getting output as empty list.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

染年凉城似染瑾 2025-02-20 08:50:13

从表中创建一个数据框架：

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.imdb.com/chart/top/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

all_data = []
for row in soup.select(".lister-list > tr"):
    name = row.select_one(".titleColumn a").text.strip()
    year = row.select_one(".titleColumn .secondaryInfo").text.strip()
    rating = row.select_one(".imdbRating").text.strip()
    # ...other variables

    all_data.append([name, year, rating])


df = pd.DataFrame(all_data, columns=["Name", "Year", "Rating"])
print(df.head().to_markdown(index=False))

prints：

name	Year farme name Year	此示例将分析`name`，`Year`，`评级`
fration the shawshank rexemption	（1994年））	9.2
教父	（1972）	9.2
黑暗骑士	（2008）	9
教父：第二部分	（1974）	9
12愤怒的男人	（1957）	8.9

This example will parse name, year, rating from the table and creates a dataframe from it:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.imdb.com/chart/top/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

all_data = []
for row in soup.select(".lister-list > tr"):
    name = row.select_one(".titleColumn a").text.strip()
    year = row.select_one(".titleColumn .secondaryInfo").text.strip()
    rating = row.select_one(".imdbRating").text.strip()
    # ...other variables

    all_data.append([name, year, rating])


df = pd.DataFrame(all_data, columns=["Name", "Year", "Rating"])
print(df.head().to_markdown(index=False))

Prints: