可以使用Beautifutsoap在许多DIV层中获取子层内容

发布于 2025-01-31 20:03:56 字数 623 浏览 3 评论 0 原文

我想在 https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/ 。我尝试使用以下方法来获取目标内容。但这无效。

 profile = requests.get("https://www.fed.cuhk.edu.hk/cri/faculty/dr-sze-man-man-paul/")
 x = BeautifulSoup(profile.text,"html.parser")
 x.find_all("h5", { "class" : "ar-faculty-section-content" })

结果是遵循了

[<div class="ar-faculty-section-content" style="font-weight: 400 !important">
]

如何在该Div部分中获取整个内容,例如H5 Li ......?

I want to access contents inside div with class name "ar-faculty-section-content" in https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/. I tried to use below method to get to the target content. But it doesn't work.

 profile = requests.get("https://www.fed.cuhk.edu.hk/cri/faculty/dr-sze-man-man-paul/")
 x = BeautifulSoup(profile.text,"html.parser")
 x.find_all("h5", { "class" : "ar-faculty-section-content" })

The result is as following

[<div class="ar-faculty-section-content" style="font-weight: 400 !important">

]

How can I get the entire content in that div section such as h5 li......?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

时常饿 2025-02-07 20:03:56

如下:

import requests
from bs4 import BeautifulSoup

url = "https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/"

source_html = requests.get(url).text
soup = BeautifulSoup(source_html, 'lxml')
h5 = soup.select_one(".ar-faculty-section-content h5").getText()
li_elements = [li.getText() for li in soup.select(".ar-faculty-section-content li")]

print(h5)
print("\n".join(li_elements))

输出:

Introduction
Yin, H., & Huang, S. (2021). Applying structural equation modelling to research on teaching and teacher education: Looking back and forward. Teaching and Teacher Education. DOI: 10.1016/j.tate.2021.103438
Yin, H., & Shi, L. (2021). Which type of interpersonal interaction better facilitates college student learning and development in China: Face-to-face or online? ECNU Review of Education. DOI: 10.1177/20965311211010818

and a lot more ...

Here's how:

import requests
from bs4 import BeautifulSoup

url = "https://www.fed.cuhk.edu.hk/cri/faculty/prof-yin-hong-biao/"

source_html = requests.get(url).text
soup = BeautifulSoup(source_html, 'lxml')
h5 = soup.select_one(".ar-faculty-section-content h5").getText()
li_elements = [li.getText() for li in soup.select(".ar-faculty-section-content li")]

print(h5)
print("\n".join(li_elements))

Output:

Introduction
Yin, H., & Huang, S. (2021). Applying structural equation modelling to research on teaching and teacher education: Looking back and forward. Teaching and Teacher Education. DOI: 10.1016/j.tate.2021.103438
Yin, H., & Shi, L. (2021). Which type of interpersonal interaction better facilitates college student learning and development in China: Face-to-face or online? ECNU Review of Education. DOI: 10.1177/20965311211010818

and a lot more ...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文