为什么此文本属性破坏了我的美丽套件?
我是新手的美丽小组,所以我在此网站上练习我的网络刮擦,文本属性不断破坏.find()函数。这是代码:
from bs4 import BeautifulSoup
import requests
url = 'https://montanahistoriclandscape.com/tag/glasgow-montana/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
article = soup.find('article')
first_p = article.find('div', class_='entry-content').p.text
print(first_p)
如果我从first_p变量的末尾删除文本,则代码运行正常;但是,它为我提供了HTML中的段落。但是,当我添加文本时,它根本没有给我输出。
有人知道这里发生了什么吗?我觉得我正看着它,但无法弄清楚。任何帮助将不胜感激!
Im new with beautifulSoup, so Im practicing my web scraping on this website and the text attribute keeps breaking the .find() function. This is the code:
from bs4 import BeautifulSoup
import requests
url = 'https://montanahistoriclandscape.com/tag/glasgow-montana/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
article = soup.find('article')
first_p = article.find('div', class_='entry-content').p.text
print(first_p)
The code runs fine if I remove the text from the end of the first_p variable; however it gives me the paragraph still in html. But when I add the text it gives me nothing at all as output.
Anyone know whats going on here? I feel like im looking right at it but can't figure it out. Any help would be appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在该
< div>
中有多个< p>
标签,并非所有这些标签都包含文本。您可以按以下方式获取所有文本:给您:
There are multiple
<p>
tags inside that<div>
, not all of them contain text. You could get all the text as follows:Giving you:
这是您的
first_p
变量中的HTML。P标签中没有文本,只有一个图像标签。
This is the HTML that is in your
first_p
variable.There is no text in the p tag, only an image tag.