尝试使用 BeautifulSoup4 从网站上抓取文本,但什么也没发生
我想从这个网站抓取数据: https://playvalorant.com/en- us/news/game-updates/
from bs4 import BeautifulSoup
import requests
site_text = requests.get('https://playvalorant.com/en-us/news/game-updates/').text
soup = BeautifulSoup(site_text, 'lxml')
posts = soup.find_all('li', class_="ContentListing-module--contentListingItem--3GAoa")
for post in posts:
post_title = post.find(
'h3', class_="heading-05 bold ContentListingCard-module--title--1vIFy").text
post_title = post_title.lower()
if "patch notes" in post_title:
patch_ver = post_title.replace('valorant patch notes ', '')
print(f'Patch version: {patch_ver}')
print("")
但是当我运行它时,什么也没有发生。
我想做的是查看 h3 是否包含文本“补丁说明”,如果是,检查它是什么版本并转到 https://playvalorant.com/en-us/news/game-updates/valorant-patch-notes-(patch-number)-(patch-number)/ (例如,如果文本是“VALORANT 补丁说明 3213.07”,那么我想去 https://playvalorant.com/en-us/ news/game-updates/valorant-patch-notes-3213-07 等。)
我有点超前了,但重点是,我怎样才能得到来自该网站的文本,然后打印出来?
I want to scrape data from this website: https://playvalorant.com/en-us/news/game-updates/
from bs4 import BeautifulSoup
import requests
site_text = requests.get('https://playvalorant.com/en-us/news/game-updates/').text
soup = BeautifulSoup(site_text, 'lxml')
posts = soup.find_all('li', class_="ContentListing-module--contentListingItem--3GAoa")
for post in posts:
post_title = post.find(
'h3', class_="heading-05 bold ContentListingCard-module--title--1vIFy").text
post_title = post_title.lower()
if "patch notes" in post_title:
patch_ver = post_title.replace('valorant patch notes ', '')
print(f'Patch version: {patch_ver}')
print("")
But when I run it, nothing happens at all.
What I want to do is to see if the h3 includes the text "patch notes" and if so, check what version it is and go to https://playvalorant.com/en-us/news/game-updates/valorant-patch-notes-(patch-number)-(patch-number)/ (for example, if the text was "VALORANT Patch Notes 3213.07", then I want to go to https://playvalorant.com/en-us/news/game-updates/valorant-patch-notes-3213-07, and so on.)
I'm getting ahead of myself, but the point is, how can I get the text from this website, and then print it out?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你看到的数据是通过Javascript加载的,所以BeautifulSoup看不到它。您可以使用
requests
模块来模拟它:
The data you see is loaded via Javascript, sou BeautifulSoup doesn't see it. You can use
requests
module to simulate it:Prints:
尝试 lxml 使用 xpath 轻松访问所需的 html 节点。
Try lxml to use xpath for accessing the required html nodes easily.