Beautifulsoup找不到“ H3;”标签

发布于 2025-02-10 22:30:25 字数 387 浏览 1 评论 0原文

这个问题的URL是: https://www.empireonline.com/电影/功能/Best-Movies-2/ 如您所见,其中存在H3标签,但美丽的肥皂不会打印H3标签。

The URL in this question is : https://www.empireonline.com/movies/features/best-movies-2/
As you can see the h3 tags are present in it but the beautiful soap don't print the h3 tag.
enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

葬花如无物 2025-02-17 22:30:25

所有信息都位于HTML内部,该信息在< script>包含JSON数据的标签中返回。

然后通常通过JavaScript将其转换为HTML,但是您仍然可以使用BeautifulSoup来找到标签,然后可以将其提取,然后使用Python的JSON库将所有数据转换为Python结构。

例如:

import requests
from bs4 import BeautifulSoup
import json

req = requests.get("https://www.empireonline.com/movies/features/best-movies-2/")
soup = BeautifulSoup(req.content, "html.parser")
script = soup.find("script", type="application/json")
data = json.loads(script.string)

for film in data["props"]["pageProps"]["data"]["getArticleByFurl"]["_layout"][7]["content"]["images"]:
    print(film["titleText"])
    print(film["description"])
    print("-------------")

困难的部分是在数据结构中找到所需的信息。我建议您打印数据,并仔细查看。

这将使您开始输出:

100) Reservoir Dogs
**1992**<br>[Quentin Tarantino](https://www.empireonline.com/people/quentin-tarantino/)'s terrific twist on the heist-gone-wrong thriller ricochets the zing and fizz of its dialogue around a gloriously intense single setting (for the most part) and centres the majority of its action around one long and incredibly bloody death scene. Oh, and by the way: Nice Guy Eddie was shot by Mr. White. Who fired twice. Case closed.<br>[Read Empire's review of Reservoir Dogs](https://www.empireonline.com/movies/reviews/empire-essay-reservoir-dogs-review/)<br>
-------------
99) Groundhog Day
**1993**<br>[Bill Murray](https://www.empireonline.com/people/bill-murray/) at the height of his loveable (eventually) schmuck powers. [Andie McDowell](https://www.empireonline.com/people/andie-macdowell/) bringing the brains and the heart. And [Harold Ramis](https://www.empireonline.com/people/harold-ramis/) (directing and co-writing with Danny Rubin) managing to find gold in the story of a man trapped in a time loop. It might not have been the first to tap this particular trope, but it's head and shoulders above the rest. Murray's snarktastic delivery makes the early going easy to laugh at, but as the movie finds deeper things to say about existence and morals, it never feels like a polemic.<br>[Read Empire's review of Groundhog Day](https://www.empireonline.com/movies/reviews/groundhog-day-review/)<br>
-------------
98) Paddington 2
**2017**<br>When the first *Paddington* was on the way, early trailers didn't look entirely promising. Yet co-writer/director [Paul King](https://www.empireonline.com/people/paul-king/) delivered a truly wonderful film bursting with joy, imagination, kindness and just one or two hard stares. How was he going to follow that? Turns out, with more of the same, but also plenty of fresh pleasures. Paddington (bouncily voiced by [Ben Whishaw](https://www.empireonline.com/people/ben-whishaw/)) matches wits with washed-up actor Phoenix Buchanan ([Hugh Grant](https://www.empireonline.com/people/hugh-grant/), chewing scenery like fine steak), being framed for theft and getting sent to prison. Like all great sequels, it works superbly as a double bill with the original.<br>[Read Empire's review of Paddington 2](https://www.empireonline.com/movies/reviews/paddington-2-review/)<br>
-------------

All of the information is inside the HTML that is returned inside a <script> tag containing JSON data.

It is then usually converted into HTML by Javascript, but you can still extract it using BeautifulSoup to find the tag and then Python's JSON library to convert all the data into a Python structure.

For example:

import requests
from bs4 import BeautifulSoup
import json

req = requests.get("https://www.empireonline.com/movies/features/best-movies-2/")
soup = BeautifulSoup(req.content, "html.parser")
script = soup.find("script", type="application/json")
data = json.loads(script.string)

for film in data["props"]["pageProps"]["data"]["getArticleByFurl"]["_layout"][7]["content"]["images"]:
    print(film["titleText"])
    print(film["description"])
    print("-------------")

The hard part is finding the information you want inside the data structure. I suggest you print data and have a closer look.

This would give you output starting:

100) Reservoir Dogs
**1992**<br>[Quentin Tarantino](https://www.empireonline.com/people/quentin-tarantino/)'s terrific twist on the heist-gone-wrong thriller ricochets the zing and fizz of its dialogue around a gloriously intense single setting (for the most part) and centres the majority of its action around one long and incredibly bloody death scene. Oh, and by the way: Nice Guy Eddie was shot by Mr. White. Who fired twice. Case closed.<br>[Read Empire's review of Reservoir Dogs](https://www.empireonline.com/movies/reviews/empire-essay-reservoir-dogs-review/)<br>
-------------
99) Groundhog Day
**1993**<br>[Bill Murray](https://www.empireonline.com/people/bill-murray/) at the height of his loveable (eventually) schmuck powers. [Andie McDowell](https://www.empireonline.com/people/andie-macdowell/) bringing the brains and the heart. And [Harold Ramis](https://www.empireonline.com/people/harold-ramis/) (directing and co-writing with Danny Rubin) managing to find gold in the story of a man trapped in a time loop. It might not have been the first to tap this particular trope, but it's head and shoulders above the rest. Murray's snarktastic delivery makes the early going easy to laugh at, but as the movie finds deeper things to say about existence and morals, it never feels like a polemic.<br>[Read Empire's review of Groundhog Day](https://www.empireonline.com/movies/reviews/groundhog-day-review/)<br>
-------------
98) Paddington 2
**2017**<br>When the first *Paddington* was on the way, early trailers didn't look entirely promising. Yet co-writer/director [Paul King](https://www.empireonline.com/people/paul-king/) delivered a truly wonderful film bursting with joy, imagination, kindness and just one or two hard stares. How was he going to follow that? Turns out, with more of the same, but also plenty of fresh pleasures. Paddington (bouncily voiced by [Ben Whishaw](https://www.empireonline.com/people/ben-whishaw/)) matches wits with washed-up actor Phoenix Buchanan ([Hugh Grant](https://www.empireonline.com/people/hugh-grant/), chewing scenery like fine steak), being framed for theft and getting sent to prison. Like all great sequels, it works superbly as a double bill with the original.<br>[Read Empire's review of Paddington 2](https://www.empireonline.com/movies/reviews/paddington-2-review/)<br>
-------------
天荒地未老 2025-02-17 22:30:25

您不能静态地刮擦该网站,因为其中一些网站是动态渲染的,也就是说,仅在浏览器执行JavaScript代码后才可以使用其某些内容(包括H3标签)。这在使用现代网络框架(例如React)的网站中很常见(这里是这种情况)。

要解决此问题,您应该使用能够运行站点脚本的刮擦工具,例如 /代码>

You can't statically scrape that website because some of it is rendered dynamically, that is, some of its contents (including the h3 tags) are available only after your browser executes JavaScript code. This is common in sites that use modern web frameworks, like React (which is the case here).

To solve this, you should use a scraping tool that is capable of running a site's scripts, like selenium.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文