如何使用Beautifulsoup在IMDB的演出中提取演员？

发布于 2025-02-07 23:26:25 字数 524 浏览 0 评论 0原文

我正在尝试使用BeautifulSoup提取办公室的演员列表来刮擦此IMDB页 https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl 。

actors = soup.findAll('table',{'cast_list'})

我将如何更改它，所以它只会给我演员的名字？ HTML的一个例子是：

<td> <a href="/name/nm0933988/?ref_=ttfc_fc_cl_t1"> Rainn Wilson </a> </td>

我只想提取文本“ Rainn Wilson”。

任何帮助都得到赞赏，这是我在这里的第一个问题，所以请对我轻松。

原文

I am trying to extract the cast list of the office using BeautifulSoup to scrape this imdb page https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl.

actors = soup.findAll('table',{'cast_list'})

How would I change this so it only gives me the actor's name? An example of the HTML is:

<td> <a href="/name/nm0933988/?ref_=ttfc_fc_cl_t1"> Rainn Wilson </a> </td>

And I would like to only extract the text 'Rainn Wilson'.

Any help is appreciated, it's my first question here so please go easy on me.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦巷 2025-02-14 23:26:25

尝试以下操作：

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0"
}
url = "https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl"

actors = (
    BeautifulSoup(requests.get(url, headers=headers).text, "lxml")
    .find('table', class_='cast_list')
    .select_one("a img")["title"]
)
print(actors)

输出：

Rainn Wilson

Try this:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0"
}
url = "https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl"

actors = (
    BeautifulSoup(requests.get(url, headers=headers).text, "lxml")
    .find('table', class_='cast_list')
    .select_one("a img")["title"]
)
print(actors)

Output:

Rainn Wilson

回复收藏 0 原文

葬シ愛 2025-02-14 23:26:25

您可以从该页面中获取所有参与者：

import requests
from bs4 import BeautifulSoup

url = "https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl"
req = requests.get(url)
soup = BeautifulSoup(req.content, "lxml")
table_actors = soup.find("table", class_="simpleCreditsTable")

for td_actor in table_actors.find_all("td", class_="name"):
    print(td_actor.a.get_text(strip=True))

首先找到持有演员的表，然后找到所有name ＆lt; td＆gt; Elements。对于每个元素，然后在下一个＆lt; a＆gt;标签中获取文本。

这将使您开始输出：

Paul Feig
Randall Einhorn
Ken Kwapis
Greg Daniels

You can get all the actors from that page as follows:

import requests
from bs4 import BeautifulSoup

url = "https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl"
req = requests.get(url)
soup = BeautifulSoup(req.content, "lxml")
table_actors = soup.find("table", class_="simpleCreditsTable")

for td_actor in table_actors.find_all("td", class_="name"):
    print(td_actor.a.get_text(strip=True))

This first locates the table holding the actors and then finds all of the name <td> elements. For each element, it then gets the text inside the next <a> tag.

This would give you output starting:

Paul Feig
Randall Einhorn
Ken Kwapis
Greg Daniels

回复收藏 0 原文

~没有更多了~

关于作者

岁月静好

暂无简介

文章

26 人气

关注发私信

夢野间

文章 0 评论 0

关注

百度③文鱼

文章 0 评论 0

关注

小草泠泠

文章 0 评论 0

关注

zhuwenyan

文章 0 评论 0

关注

weirdo

文章 0 评论 0

关注

坚持沉默

文章 0 评论 0

友情链接

文江博客

如何使用Beautifulsoup在IMDB的演出中提取演员？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如何使用Beautifulsoup在IMDB的演出中提取演员？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

夢野间

百度③文鱼

小草泠泠

zhuwenyan

weirdo

坚持沉默

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。