美丽的协助

发布于 2025-01-30 15:01:13 字数 1895 浏览 4 评论 0原文

我正在尝试刮擦以下网站（ https://www.english-heritage.org.uk.uk/visit/blue-plaques/#?pagebp = 1＆sizebp; sizebp; sizebp =12＆ampp = ampp; ampp; ambp; ambp; ambp; ambp; amp.amp.amkybp = umkekeybp =＆amp ump atampt = ; catbp = 0 ），最终有兴趣存储每个'li class =“ search-result-item”中的某些数据以执行进一步的分析。

一个“搜索result-item”的示例，

我想捕获＆lt; h3＆gt;，＆lt; span class =“ plaque-lole”＆gt;和＆lt; span class =“ Plaque-Location”＆gt;在Python词典中：

<li class="search-result-item"><a href="/visit/blue-plaques/helen-gwynne-vaughan/"><img class="search-result-image max-width" src="/siteassets/home/visit/blue-plaques/find-a-plaque/blue-plaques-f-j/helen-gwynne-vaughan-plaque.jpg?w=732&amp;h=465&amp;mode=crop&amp;scale=both&amp;cache=always&amp;quality=60&amp;anchor=&amp;WebsiteVersion=20220516171525" alt="" title=""><div class="search-result-info"><h3>GWYNNE-VAUGHAN, Dame Helen (1879-1967)</h3><span class="plaque-role">Botanist and Military Officer</span><span class="plaque-location">Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden</span></div></a></li>

到目前为止，我正在尝试隔离所有“搜索result-item”，但我当前的代码绝对没有打印。如果有人可以帮助我解决这个问题，并将我指向正确的方向，将每个数据元素存储到Python词典中，我将非常感激。

from bs4 import BeautifulSoup
import requests

url = 'https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.prettify())
print(soup.find_all(class_='search-result-item')).get_text()

原文

I am trying to scrape the following website (https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0) and ultimately am interested in storing some of the data inside each 'li class="search-result-item"' to perform further analytics.

Example of one "search-result-item"

I want to capture the <h3>,<span class="plaque-role"> and <span class="plaque-location"> in a python dictionary:

<li class="search-result-item"><a href="/visit/blue-plaques/helen-gwynne-vaughan/"><img class="search-result-image max-width" src="/siteassets/home/visit/blue-plaques/find-a-plaque/blue-plaques-f-j/helen-gwynne-vaughan-plaque.jpg?w=732&h=465&mode=crop&scale=both&cache=always&quality=60&anchor=&WebsiteVersion=20220516171525" alt="" title=""><div class="search-result-info"><h3>GWYNNE-VAUGHAN, Dame Helen (1879-1967)</h3><span class="plaque-role">Botanist and Military Officer</span><span class="plaque-location">Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden</span></div></a></li>

So far I am trying to isolate all the "search-result-item" but my current code prints absolutely nothing. If someone can help me sort that problem out and point me in the right direction to storing each data element into a python dictionary I would be very grateful.

from bs4 import BeautifulSoup
import requests

url = 'https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.prettify())
print(soup.find_all(class_='search-result-item')).get_text()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你在看孤独的风景 2025-02-06 15:01:14

内容是通过javaScript动态生成的，因此您不会使用beautifulsoup而不是使用其API找到所需的元素/信息。

示例

import requests

url = 'https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url).json()
data = []
for e in page['plaques']:
    data.append(dict((k,v) for k,v in e.items() if k in ['title','professions','address']))
data

输出

[{'title': 'GWYNNE-VAUGHAN, Dame Helen (1879-1967)', 'address': 'Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden', 'professions': 'Botanist and Military Officer'}, {'title': 'READING, Lady Stella (1894-1971)', 'address': '41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster', 'professions': "Founder of the Women's Voluntary Service"}, {'title': '32 SOHO SQUARE', 'address': '32 Soho Square, Soho, London, W1D 3AP, City Of Westminster', 'professions': 'Botanists'}, {'title': '14 BUCKINGHAM STREET', 'address': '14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster', 'professions': 'Statesman, Diarist, Naval Official, Painter'}, {'title': 'ABRAHAMS, Harold (1899-1978)', 'address': 'Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet', 'professions': 'Athlete'}, ...]

Content is generated dynamically by JavaScript so you wont find the elements / info you are looking for with BeautifulSoup, instead use their API.

Example

import requests

url = 'https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url).json()
data = []
for e in page['plaques']:
    data.append(dict((k,v) for k,v in e.items() if k in ['title','professions','address']))
data

Output

[{'title': 'GWYNNE-VAUGHAN, Dame Helen (1879-1967)', 'address': 'Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden', 'professions': 'Botanist and Military Officer'}, {'title': 'READING, Lady Stella (1894-1971)', 'address': '41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster', 'professions': "Founder of the Women's Voluntary Service"}, {'title': '32 SOHO SQUARE', 'address': '32 Soho Square, Soho, London, W1D 3AP, City Of Westminster', 'professions': 'Botanists'}, {'title': '14 BUCKINGHAM STREET', 'address': '14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster', 'professions': 'Statesman, Diarist, Naval Official, Painter'}, {'title': 'ABRAHAMS, Harold (1899-1978)', 'address': 'Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet', 'professions': 'Athlete'}, ...]

回复收藏 0 原文

像你 2025-02-06 15:01:13

您不会得到任何东西，因为搜索结果是由JavaScript生成的。使用他们从中获取数据的API端点。

例如：

import requests

api_url = "https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0&_=1653043005731"

plaques = requests.get(api_url).json()["plaques"]

for plaque in plaques:
    print(plaque["title"])
    print(plaque["address"])
    print(f"https://www.english-heritage.org.uk{plaque['path']}")
    print("-" * 80)

输出：

GWYNNE-VAUGHAN, Dame Helen (1879-1967)
Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden
https://www.english-heritage.org.uk/visit/blue-plaques/helen-gwynne-vaughan/
--------------------------------------------------------------------------------
READING, Lady Stella (1894-1971)
41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/stella-lady-reading/
--------------------------------------------------------------------------------
32 SOHO SQUARE
32 Soho Square, Soho, London, W1D 3AP, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/soho-square/
--------------------------------------------------------------------------------
14 BUCKINGHAM STREET
14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/buckingham-street/
--------------------------------------------------------------------------------
ABRAHAMS, Harold (1899-1978)
Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet
https://www.english-heritage.org.uk/visit/blue-plaques/abrahams-harold/
--------------------------------------------------------------------------------
ADAM, ROBERT and HOOD, THOMAS and GALSWORTHY, JOHN and BARRIE, SIR JAMES
1-3 Robert Street, Adelphi, Charing Cross, London, WC2N 6BN, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adam-hood-galsworthy-barrie/
--------------------------------------------------------------------------------
ADAMS, Henry Brooks (1838-1918)
98 Portland Place, Marylebone, London, W1B 1ET, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/united-states-embassy/
--------------------------------------------------------------------------------
ADELPHI, The
The Adelphi Terrace, Charing Cross, London, WC2N 6BJ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adelphi/
--------------------------------------------------------------------------------
ALDRIDGE, Ira (1807-1867)
5 Hamlet Road, Upper Norwood, London, SE19 2AP, London Borough of Bromley
https://www.english-heritage.org.uk/visit/blue-plaques/aldridge-ira/
--------------------------------------------------------------------------------
ALEXANDER, Sir George (1858-1918)
57 Pont Street, Chelsea, London, SW1X 0BD, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/george-alexander/
--------------------------------------------------------------------------------
ALLENBY, Field Marshal Edmund Henry Hynman, Viscount Allenby  (1861-1936)
24 Wetherby Gardens, South Kensington, London, SW5 0JR, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/field-marshal-viscount-allenby/
--------------------------------------------------------------------------------
ALMA-TADEMA, Sir Lawrence, O.M.  (1836-1912)
44 Grove End Road, St John's Wood, London, NW8 9NE, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/lawrence-alma-tadema/
--------------------------------------------------------------------------------

You're not getting anything because the search results are generated by JavaScript. Use the API endpoint they fetch the data from.

For example:

import requests

api_url = "https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0&_=1653043005731"

plaques = requests.get(api_url).json()["plaques"]

for plaque in plaques:
    print(plaque["title"])
    print(plaque["address"])
    print(f"https://www.english-heritage.org.uk{plaque['path']}")
    print("-" * 80)

Output:

GWYNNE-VAUGHAN, Dame Helen (1879-1967)
Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden
https://www.english-heritage.org.uk/visit/blue-plaques/helen-gwynne-vaughan/
--------------------------------------------------------------------------------
READING, Lady Stella (1894-1971)
41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/stella-lady-reading/
--------------------------------------------------------------------------------
32 SOHO SQUARE
32 Soho Square, Soho, London, W1D 3AP, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/soho-square/
--------------------------------------------------------------------------------
14 BUCKINGHAM STREET
14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/buckingham-street/
--------------------------------------------------------------------------------
ABRAHAMS, Harold (1899-1978)
Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet
https://www.english-heritage.org.uk/visit/blue-plaques/abrahams-harold/
--------------------------------------------------------------------------------
ADAM, ROBERT and HOOD, THOMAS and GALSWORTHY, JOHN and BARRIE, SIR JAMES
1-3 Robert Street, Adelphi, Charing Cross, London, WC2N 6BN, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adam-hood-galsworthy-barrie/
--------------------------------------------------------------------------------
ADAMS, Henry Brooks (1838-1918)
98 Portland Place, Marylebone, London, W1B 1ET, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/united-states-embassy/
--------------------------------------------------------------------------------
ADELPHI, The
The Adelphi Terrace, Charing Cross, London, WC2N 6BJ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adelphi/
--------------------------------------------------------------------------------
ALDRIDGE, Ira (1807-1867)
5 Hamlet Road, Upper Norwood, London, SE19 2AP, London Borough of Bromley
https://www.english-heritage.org.uk/visit/blue-plaques/aldridge-ira/
--------------------------------------------------------------------------------
ALEXANDER, Sir George (1858-1918)
57 Pont Street, Chelsea, London, SW1X 0BD, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/george-alexander/
--------------------------------------------------------------------------------
ALLENBY, Field Marshal Edmund Henry Hynman, Viscount Allenby  (1861-1936)
24 Wetherby Gardens, South Kensington, London, SW5 0JR, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/field-marshal-viscount-allenby/
--------------------------------------------------------------------------------
ALMA-TADEMA, Sir Lawrence, O.M.  (1836-1912)
44 Grove End Road, St John's Wood, London, NW8 9NE, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/lawrence-alma-tadema/
--------------------------------------------------------------------------------

回复收藏 0 原文

~没有更多了~