为什么此文本属性破坏了我的美丽套件？

发布于 2025-02-10 04:50:56 字数 493 浏览 2 评论 0原文

我是新手的美丽小组，所以我在此网站上练习我的网络刮擦，文本属性不断破坏.find（）函数。这是代码：

from bs4 import BeautifulSoup
import requests

url = 'https://montanahistoriclandscape.com/tag/glasgow-montana/'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

article = soup.find('article')

first_p = article.find('div', class_='entry-content').p.text
print(first_p)

如果我从first_p变量的末尾删除文本，则代码运行正常；但是，它为我提供了HTML中的段落。但是，当我添加文本时，它根本没有给我输出。

有人知道这里发生了什么吗？我觉得我正看着它，但无法弄清楚。任何帮助将不胜感激！

原文

Im new with beautifulSoup, so Im practicing my web scraping on this website and the text attribute keeps breaking the .find() function. This is the code:

from bs4 import BeautifulSoup
import requests

url = 'https://montanahistoriclandscape.com/tag/glasgow-montana/'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

article = soup.find('article')

first_p = article.find('div', class_='entry-content').p.text
print(first_p)

The code runs fine if I remove the text from the end of the first_p variable; however it gives me the paragraph still in html. But when I add the text it gives me nothing at all as output.

Anyone know whats going on here? I feel like im looking right at it but can't figure it out. Any help would be appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吲‖鸣 2025-02-17 04:50:56

在该＆lt; div＆gt;中有多个＆lt; p＆gt;标签，并非所有这些标签都包含文本。您可以按以下方式获取所有文本：

from bs4 import BeautifulSoup
import requests

url = 'https://montanahistoriclandscape.com/tag/glasgow-montana/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
article = soup.find('article')
div_entry = article.find('div', class_='entry-content')

for p in div_entry.find_all('p'):
    text = p.get_text(strip=True)
    
    if text:    # skip empty lines
        print(text)

给您：

It has been five years since I revisited the historic built environment of northeast Montana.  My last posting took a second look at Wolf Point, the seat of Roosevelt County.  I thought a perfect follow-up would be second looks at the different county seats of the region–a part of the Treasure State that I have always enjoyed visiting, and would strongly encourage you to do the same.
Grain elevators along the Glasgow railroad corridor.
Like Wolf Point, Glasgow is another of the county seats created in the wake of the Manitoba Road/Great Northern Railway building through the state in the late 1880s.  Glasgow is the seat of Valley County.  The courthouse grounds include not only the modernist building above from 1973 but a WPA-constructed courthouse annex/ public building from 1939-1940 behind the courthouse.
The understated WPA classic look of this building fits into the architectural legacies of Glasgow.  My first post about the town looked at its National Register buildings and the blending of classicism and modernism.  Here I want to highlight other impressive properties that I left out of the original Glasgow entry.  St. Michael’s Episcopal Church is an excellent late 19th century of Gothic Revival style in Montana.
The town has other architecturally distinctive commercial buildings that document its transition from late Victorian era railroad town to am early 20th century homesteading boom town.
The fact that these buildings are well-kept and in use speaks to the local commitment to stewardship and effective adaptive reuse projects.  As part of Glasgow’s architectural legacy I should have said more about its Craftsman-style buildings, beyond the National
Register-listed Rundle Building.  The Rundle is truly eye-catching but Glasgow also has a Mission-styled apartment row and then its historic Masonic Lodge.
I have always been impressed with the public landscapes of Glasgow, from the courthouse grounds to the city-county library (and its excellent local history collection)
and on to Valley County Fairgrounds which are located on the boundaries of town.
Another key public institution is the Valley County Pioneer Museum, which proudly emphasizes the theme of from dinosaur bones to moon walk–just see its entrance.
The museum was a fairly new institution when I first visited in 1984 and local leaders proudly took me through the collection as a way of emphasizing what themes and what places they wanted to be considered in the state historic preservation plan.  Then I spoke with the community that evening at the museum.  Not surprisingly then, the museum has ever since been a favorite place.  Its has grown substantially in 35 years to include buildings and other large items on a lot adjacent to the museum collections.  I have earlier discussed its collection of Thomas Moleworth furniture–a very important bit of western material culture from the previous town library.  In the images below, I want to suggest its range–from the deep Native American past to the railroad era to the county’s huge veteran story and even its high school band and sports history.
A new installation, dating to the Lewis and Clark Bicentennial of 2003, is a mural depicting the Corps of Discovery along the Missouri River in Valley County.  The mural is signed by artist Jesse W. Henderson, who also identifies himself as a Chippewa-Cree.  The mural is huge, and to adequately convey its details I have divided my images into the different groups of people Henderson interprets in the mural.
The Henderson mural, together with the New Deal mural of the post office/courthouse discussed in my first Glasgow posting (below is a single image of that work by Forrest
Hill), are just two of the reasons to stop in Glasgow–it is one of those county seats where I discover something new every time I travel along U.S. Highway 2.

There are multiple <p> tags inside that <div>, not all of them contain text. You could get all the text as follows:

from bs4 import BeautifulSoup
import requests

url = 'https://montanahistoriclandscape.com/tag/glasgow-montana/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
article = soup.find('article')
div_entry = article.find('div', class_='entry-content')

for p in div_entry.find_all('p'):
    text = p.get_text(strip=True)
    
    if text:    # skip empty lines
        print(text)

Giving you:

It has been five years since I revisited the historic built environment of northeast Montana.  My last posting took a second look at Wolf Point, the seat of Roosevelt County.  I thought a perfect follow-up would be second looks at the different county seats of the region–a part of the Treasure State that I have always enjoyed visiting, and would strongly encourage you to do the same.
Grain elevators along the Glasgow railroad corridor.
Like Wolf Point, Glasgow is another of the county seats created in the wake of the Manitoba Road/Great Northern Railway building through the state in the late 1880s.  Glasgow is the seat of Valley County.  The courthouse grounds include not only the modernist building above from 1973 but a WPA-constructed courthouse annex/ public building from 1939-1940 behind the courthouse.
The understated WPA classic look of this building fits into the architectural legacies of Glasgow.  My first post about the town looked at its National Register buildings and the blending of classicism and modernism.  Here I want to highlight other impressive properties that I left out of the original Glasgow entry.  St. Michael’s Episcopal Church is an excellent late 19th century of Gothic Revival style in Montana.
The town has other architecturally distinctive commercial buildings that document its transition from late Victorian era railroad town to am early 20th century homesteading boom town.
The fact that these buildings are well-kept and in use speaks to the local commitment to stewardship and effective adaptive reuse projects.  As part of Glasgow’s architectural legacy I should have said more about its Craftsman-style buildings, beyond the National
Register-listed Rundle Building.  The Rundle is truly eye-catching but Glasgow also has a Mission-styled apartment row and then its historic Masonic Lodge.
I have always been impressed with the public landscapes of Glasgow, from the courthouse grounds to the city-county library (and its excellent local history collection)
and on to Valley County Fairgrounds which are located on the boundaries of town.
Another key public institution is the Valley County Pioneer Museum, which proudly emphasizes the theme of from dinosaur bones to moon walk–just see its entrance.
The museum was a fairly new institution when I first visited in 1984 and local leaders proudly took me through the collection as a way of emphasizing what themes and what places they wanted to be considered in the state historic preservation plan.  Then I spoke with the community that evening at the museum.  Not surprisingly then, the museum has ever since been a favorite place.  Its has grown substantially in 35 years to include buildings and other large items on a lot adjacent to the museum collections.  I have earlier discussed its collection of Thomas Moleworth furniture–a very important bit of western material culture from the previous town library.  In the images below, I want to suggest its range–from the deep Native American past to the railroad era to the county’s huge veteran story and even its high school band and sports history.
A new installation, dating to the Lewis and Clark Bicentennial of 2003, is a mural depicting the Corps of Discovery along the Missouri River in Valley County.  The mural is signed by artist Jesse W. Henderson, who also identifies himself as a Chippewa-Cree.  The mural is huge, and to adequately convey its details I have divided my images into the different groups of people Henderson interprets in the mural.
The Henderson mural, together with the New Deal mural of the post office/courthouse discussed in my first Glasgow posting (below is a single image of that work by Forrest
Hill), are just two of the reasons to stop in Glasgow–it is one of those county seats where I discover something new every time I travel along U.S. Highway 2.

回复收藏 0 原文

人间☆小暴躁 2025-02-17 04:50:56

这是您的first_p变量中的HTML。

<p><img alt="Valley Co Glasgow courthouse" class="alignnone size-full wp-image-18200" data-attachment-id="18200" data-comments-opened="1" data-image-caption="" data-image-description="" data-image-meta='{"aperture":"10","credit":"","camera":"Canon EOS REBEL T2i","caption":"","created_timestamp":"946684800","copyright":"","focal_length":"24","iso":"100","shutter_speed":"0.005","title":"Valley Co Glasgow courthouse","orientation":"1"}' data-image-title="Valley Co Glasgow courthouse" data-large-file="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=584" data-medium-file="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=300" data-orig-file="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg" data-orig-size="5083,2348" data-permalink="https://montanahistoriclandscape.com/2019/03/03/eastern-montana-county-seats-glasgow/valley-co-glasgow-courthouse/" sizes="(max-width: 584px) 100vw, 584px" src="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=584" srcset="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=584 584w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=1168 1168w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=150 150w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=300 300w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=768 768w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=1024 1024w"/></p>

P标签中没有文本，只有一个图像标签。

This is the HTML that is in your first_p variable.

<p><img alt="Valley Co Glasgow courthouse" class="alignnone size-full wp-image-18200" data-attachment-id="18200" data-comments-opened="1" data-image-caption="" data-image-description="" data-image-meta='{"aperture":"10","credit":"","camera":"Canon EOS REBEL T2i","caption":"","created_timestamp":"946684800","copyright":"","focal_length":"24","iso":"100","shutter_speed":"0.005","title":"Valley Co Glasgow courthouse","orientation":"1"}' data-image-title="Valley Co Glasgow courthouse" data-large-file="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=584" data-medium-file="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=300" data-orig-file="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg" data-orig-size="5083,2348" data-permalink="https://montanahistoriclandscape.com/2019/03/03/eastern-montana-county-seats-glasgow/valley-co-glasgow-courthouse/" sizes="(max-width: 584px) 100vw, 584px" src="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=584" srcset="https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=584 584w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=1168 1168w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=150 150w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=300 300w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=768 768w, https://carrollvanwest.files.wordpress.com/2019/03/img_7957.jpg?w=1024 1024w"/></p>

There is no text in the p tag, only an image tag.

回复收藏 0 原文

~没有更多了~