如何使用Selinium Python在JavaScript基础网站中输入/搜索文本?

发布于 2025-02-13 20:50:46 字数 360 浏览 4 评论 0原文

我想在搜索字段中输入结果,并从输出页面中获取eircode/zipcode/postalcode。 nofollow noreferrer“> https://eircode-finder.com/search/ 和搜索地址清单,例如: 8旧的Bawn Court Tallaght都柏林 从结果中,我想获取eircode/zipcode/postalcode,然后将其保存在 .txt文件我已经使用了Beautifutsoup获取数据,但是即使是页面的HTML也没有获取。我不知道详细信息,但是网站上的某些内容像JavaScript这样的东西阻止了我从该网站获取数据。

I want to input result in search field and get the Eircode/zipcode/postalcode from output page. e.g: https://eircode-finder.com/search/
and search list of addresses like: 8 old bawn court tallaght dublin
and from results I want to fetch Eircode/zipcode/postalcode and save it in a .txt file
I have used beautifulsoup to fetch data, but Its not fetching even the html of the page.I don't know details but something is on the website like javascript which is preventing me to get data from that website.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

醉酒的小男人 2025-02-20 20:50:46

您可以使用下一个示例如何向此页面请求API:

import requests
import pandas as pd

url = "https://geocode.search.hereapi.com/v1/geocode"

to_search = [
    "Coolboy Wicklow",
    "8 old bawn court tallaght dublin",
]

headers = {"Referer": "https://eircode-finder.com/"}
params = {
    "q": "",
    "lang": "en",
    "in": "countryCode:IRL",
    "apiKey": "BegLfP-EDdyWflI0fRrP3HJ7IDSK_0878_n2fbct1wE",
}


def get_item(q):
    params["q"] = q
    data = requests.get(url, params=params, headers=headers).json()
    out = []

    for i in data["items"]:
        out.append([i["title"], i["address"].get("postalCode")])
    return out


all_data = []
for q in to_search:
    all_data += get_item(q)

df = pd.DataFrame(all_data, columns=["title", "postal_code"])
df = df.drop_duplicates()
print(df.to_markdown(index=False))

prints:

titlePostal_code
coolboy,Arklow,Arklow,Arklow,Wicklow,Wicklow,Ireland,爱尔兰
8旧法庭,都柏林,都柏林县D24 N1YH,爱尔兰D24 N1YH

You can use next example how to make a request to this page api:

import requests
import pandas as pd

url = "https://geocode.search.hereapi.com/v1/geocode"

to_search = [
    "Coolboy Wicklow",
    "8 old bawn court tallaght dublin",
]

headers = {"Referer": "https://eircode-finder.com/"}
params = {
    "q": "",
    "lang": "en",
    "in": "countryCode:IRL",
    "apiKey": "BegLfP-EDdyWflI0fRrP3HJ7IDSK_0878_n2fbct1wE",
}


def get_item(q):
    params["q"] = q
    data = requests.get(url, params=params, headers=headers).json()
    out = []

    for i in data["items"]:
        out.append([i["title"], i["address"].get("postalCode")])
    return out


all_data = []
for q in to_search:
    all_data += get_item(q)

df = pd.DataFrame(all_data, columns=["title", "postal_code"])
df = df.drop_duplicates()
print(df.to_markdown(index=False))

Prints:

titlepostal_code
Coolboy, Arklow, County Wicklow, Ireland
8 Old Bawn Court, Dublin, County Dublin, D24 N1YH, IrelandD24 N1YH
人事已非 2025-02-20 20:50:46

提到的网站是使用React开发的,该需要JavaScript引擎渲染HTML页面。
美丽的汤 只会发送请求,如果它的正常网站响应将是 html 强>需要JavaScript引擎才能渲染的数据。
需要使用JavaScript引擎的网站可以用硒来取消,因为它使用实际的浏览器请求和加载页面。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd

path = r"./chromedriver.exe"
driver = webdriver.Chrome(path)

url = "https://eircode-finder.com/search/"


driver.get(url)
search_input=driver.find_element_by_id("outlined")
search_input.send_keys("8 old bawn court tallaght dublin") # add text which you want to send
search_input.send_keys(Keys.ENTER)
time.sleep(10) # for page to load
eircode=driver.find_element_by_css_selector("#root > div:nth-child(2) > div > div.MuiBox-root.jss12 > div > div > div > div:nth-child(1) > div.MuiBox-root.jss13 > div > div > h3 > div")
print(eircode.text)
time.sleep(10) # buffer

# you can pass this page source to beautiful soup and scrap it
# or you can continue scrapping with selenium.
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(driver.page_source)

您可以查看此视频可以帮助您更好地理解

Mentioned website is developed using react which requires javascript engine for rendering html pages.
Beautiful Soup just sends the request and take the response if it's normal website response will be HTML or else it will be JSON data which require javascript engine to render.
Website which require javascript engine can be scrapped with selenium as it uses actual browser to request and load the page.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd

path = r"./chromedriver.exe"
driver = webdriver.Chrome(path)

url = "https://eircode-finder.com/search/"


driver.get(url)
search_input=driver.find_element_by_id("outlined")
search_input.send_keys("8 old bawn court tallaght dublin") # add text which you want to send
search_input.send_keys(Keys.ENTER)
time.sleep(10) # for page to load
eircode=driver.find_element_by_css_selector("#root > div:nth-child(2) > div > div.MuiBox-root.jss12 > div > div > div > div:nth-child(1) > div.MuiBox-root.jss13 > div > div > h3 > div")
print(eircode.text)
time.sleep(10) # buffer

# you can pass this page source to beautiful soup and scrap it
# or you can continue scrapping with selenium.
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(driver.page_source)

You can check out this video which can help you understand better

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文