相对于同一标签不刮擦的多个值

发布于 2025-02-01 20:14:59 字数 3138 浏览 1 评论 0原文

我的“房间数”和“房间”搜索没有值。

https://www.zoopla.co.uk/property/uprn/906032139/< /a>

我可以在这里看到我应该返回一些东西,但没有得到任何东西。

谁能指向我如何解决这个问题的正确方向?我什至不确定要搜索什么,因为它没有错误。我认为它会将所有数据放入其中,然后我需要找出一种将其分开的方法。我可能需要将其刮入字典中吗?

import requests
from bs4 import BeautifulSoup as bs
import numpy as np
import pandas as pd
import matplotlib as plt
import time

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.5",
    "Referer": "https://google.co.uk",
    "DNT": "1"
}

page = 1
addresses = []
while page != 2:
    url = f"https://www.zoopla.co.uk/house-prices/edinburgh/?pn={page}"
    print(url)
    response = requests.get(url, headers=headers)
    print(response)
    html = response.content
    soup = bs(html, "lxml")
    time.sleep(1)
    for address in soup.find_all("div", class_="c-rgUPM c-rgUPM-pnwXf-hasUprn-true"):
        details = {}
        # Getting the address
        details["Address"] = address.h2.get_text(strip=True)
        # Getting each addresses unique URL
        scotland_house_url = f'https://www.zoopla.co.uk{address.find("a")["href"]}'
        details["URL"] = scotland_house_url
        scotland_house_url_response = requests.get(
            scotland_house_url, headers=headers)
        scotland_house_soup = bs(scotland_house_url_response.content, "lxml")
        # Lists status of the property
        try:
            details["Status"] = [status.get_text(strip=True) for status in scotland_house_soup.find_all(
                "span", class_="css-10o3xac-Tag e164ranr11")]
        except AttributeError:
            details["Status"] = ""
        # Lists the date of the status of the property
        try:
            details["Status Date"] = [status_date.get_text(
                strip=True) for status_date in scotland_house_soup.find_all("p", class_="css-1jq4rzj e164ranr10")]
        except AttributeError:
            details["Status Date"] = ""
        # Lists the value of the property
        try:
            details["Value"] = [value.get_text(strip=True).replace(",", "").replace(
                "£", "") for value in scotland_house_soup.find_all("p", class_="css-1x01gac-Text eczcs4p0")]
        except AttributeError:
            details["Value"] = ""
         # Lists the number of rooms
        try:
            details["Number of Rooms"] = [number_of_rooms.get_text(strip=True) for number_of_rooms in scotland_house_soup.find_all(
                "p", class_="css-82kmy1 e13gx5i3")]
        except AttributeError:
            details["Number of Rooms"] = ""
         # Lists type of room
        try:
            details["Room"] = [room.get_text(strip=True) for room in scotland_house_soup.find_all(
                "span", class_="css-1avcdf2 e13gx5i4")]
        except AttributeError:
            details["Room"] = ""
        addresses.append(details)
    page = page + 1

for address in addresses[:]:
    print(address)
print(response)

I'm getting no values for my "Number of Rooms" and "Room" search.

https://www.zoopla.co.uk/property/uprn/906032139/

I can see here that I should be returning something but not getting anything.

Can anyone possibly point me in the right direction of how to solve this? I am not even sure what to search for as it's not erroring. I thought it would put all the data in and then I would need to figure out a way to seperate it. Do I need to maybe scrape it into a dictionary?

import requests
from bs4 import BeautifulSoup as bs
import numpy as np
import pandas as pd
import matplotlib as plt
import time

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.5",
    "Referer": "https://google.co.uk",
    "DNT": "1"
}

page = 1
addresses = []
while page != 2:
    url = f"https://www.zoopla.co.uk/house-prices/edinburgh/?pn={page}"
    print(url)
    response = requests.get(url, headers=headers)
    print(response)
    html = response.content
    soup = bs(html, "lxml")
    time.sleep(1)
    for address in soup.find_all("div", class_="c-rgUPM c-rgUPM-pnwXf-hasUprn-true"):
        details = {}
        # Getting the address
        details["Address"] = address.h2.get_text(strip=True)
        # Getting each addresses unique URL
        scotland_house_url = f'https://www.zoopla.co.uk{address.find("a")["href"]}'
        details["URL"] = scotland_house_url
        scotland_house_url_response = requests.get(
            scotland_house_url, headers=headers)
        scotland_house_soup = bs(scotland_house_url_response.content, "lxml")
        # Lists status of the property
        try:
            details["Status"] = [status.get_text(strip=True) for status in scotland_house_soup.find_all(
                "span", class_="css-10o3xac-Tag e164ranr11")]
        except AttributeError:
            details["Status"] = ""
        # Lists the date of the status of the property
        try:
            details["Status Date"] = [status_date.get_text(
                strip=True) for status_date in scotland_house_soup.find_all("p", class_="css-1jq4rzj e164ranr10")]
        except AttributeError:
            details["Status Date"] = ""
        # Lists the value of the property
        try:
            details["Value"] = [value.get_text(strip=True).replace(",", "").replace(
                "£", "") for value in scotland_house_soup.find_all("p", class_="css-1x01gac-Text eczcs4p0")]
        except AttributeError:
            details["Value"] = ""
         # Lists the number of rooms
        try:
            details["Number of Rooms"] = [number_of_rooms.get_text(strip=True) for number_of_rooms in scotland_house_soup.find_all(
                "p", class_="css-82kmy1 e13gx5i3")]
        except AttributeError:
            details["Number of Rooms"] = ""
         # Lists type of room
        try:
            details["Room"] = [room.get_text(strip=True) for room in scotland_house_soup.find_all(
                "span", class_="css-1avcdf2 e13gx5i4")]
        except AttributeError:
            details["Room"] = ""
        addresses.append(details)
    page = page + 1

for address in addresses[:]:
    print(address)
print(response)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小情绪 2025-02-08 20:14:59

选择类_ =“ CSS-1AVCDF2 E13GX5I4”似乎很脆弱,类可能一直在改变。尝试不同的CSS选择器:

import requests
from bs4 import BeautifulSoup

url = "https://www.zoopla.co.uk/property/uprn/906032139/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

tag = soup.select_one('#timeline p:has(svg[data-testid="bed"]) + p')

no_beds, beds = tag.get_text(strip=True, separator=" ").split()
print(no_beds, beds)

打印:

1 bed

如果您想要所有类型的房间:

for detail in soup.select("#timeline p:has(svg[data-testid]) + p"):
    n, type_ = detail.get_text(strip=True, separator="|").split("|")
    print(n, type_)

打印:

1 bed
1 bath
1 reception

Selecting by class_="css-1avcdf2 e13gx5i4" seems brittle, the class might change all the time. Try different CSS selector:

import requests
from bs4 import BeautifulSoup

url = "https://www.zoopla.co.uk/property/uprn/906032139/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

tag = soup.select_one('#timeline p:has(svg[data-testid="bed"]) + p')

no_beds, beds = tag.get_text(strip=True, separator=" ").split()
print(no_beds, beds)

Prints:

1 bed

If you want all types of rooms:

for detail in soup.select("#timeline p:has(svg[data-testid]) + p"):
    n, type_ = detail.get_text(strip=True, separator="|").split("|")
    print(n, type_)

Prints:

1 bed
1 bath
1 reception
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文