如何使用Beautifulsoup在IMDB的演出中提取演员?
我正在尝试使用BeautifulSoup提取办公室的演员列表来刮擦此IMDB页 https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl 。
actors = soup.findAll('table',{'cast_list'})
我将如何更改它,所以它只会给我演员的名字? HTML的一个例子是:
<td> <a href="/name/nm0933988/?ref_=ttfc_fc_cl_t1"> Rainn Wilson </a> </td>
我只想提取文本“ Rainn Wilson”。
任何帮助都得到赞赏,这是我在这里的第一个问题,所以请对我轻松。
I am trying to extract the cast list of the office using BeautifulSoup to scrape this imdb page https://www.imdb.com/title/tt0386676/fullcredits/?ref_=tt_ql_cl.
actors = soup.findAll('table',{'cast_list'})
How would I change this so it only gives me the actor's name? An example of the HTML is:
<td> <a href="/name/nm0933988/?ref_=ttfc_fc_cl_t1"> Rainn Wilson </a> </td>
And I would like to only extract the text 'Rainn Wilson'.
Any help is appreciated, it's my first question here so please go easy on me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试以下操作:
输出:
Try this:
Output:
您可以从该页面中获取所有参与者:
首先找到持有演员的表,然后找到所有
name
&lt; td&gt;
Elements。对于每个元素,然后在下一个&lt; a&gt;
标签中获取文本。这将使您开始输出:
You can get all the actors from that page as follows:
This first locates the table holding the actors and then finds all of the
name
<td>
elements. For each element, it then gets the text inside the next<a>
tag.This would give you output starting: