如何使用Selinium Python在JavaScript基础网站中输入/搜索文本？

发布于 2025-02-13 20:50:46 字数 360 浏览 4 评论 0原文

我想在搜索字段中输入结果，并从输出页面中获取eircode/zipcode/postalcode。 nofollow noreferrer“> https://eircode-finder.com/search/ 和搜索地址清单，例如： 8旧的Bawn Court Tallaght都柏林 从结果中，我想获取eircode/zipcode/postalcode，然后将其保存在 .txt文件中我已经使用了Beautifutsoup获取数据，但是即使是页面的HTML也没有获取。我不知道详细信息，但是网站上的某些内容像JavaScript这样的东西阻止了我从该网站获取数据。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉酒的小男人 2025-02-20 20:50:46

您可以使用下一个示例如何向此页面请求API：

import requests
import pandas as pd

url = "https://geocode.search.hereapi.com/v1/geocode"

to_search = [
    "Coolboy Wicklow",
    "8 old bawn court tallaght dublin",
]

headers = {"Referer": "https://eircode-finder.com/"}
params = {
    "q": "",
    "lang": "en",
    "in": "countryCode:IRL",
    "apiKey": "BegLfP-EDdyWflI0fRrP3HJ7IDSK_0878_n2fbct1wE",
}


def get_item(q):
    params["q"] = q
    data = requests.get(url, params=params, headers=headers).json()
    out = []

    for i in data["items"]:
        out.append([i["title"], i["address"].get("postalCode")])
    return out


all_data = []
for q in to_search:
    all_data += get_item(q)

df = pd.DataFrame(all_data, columns=["title", "postal_code"])
df = df.drop_duplicates()
print(df.to_markdown(index=False))

prints：

title	Postal_code
coolboy，Arklow，Arklow，Arklow，Wicklow，Wicklow，Ireland，爱尔兰
8旧法庭，都柏林，都柏林县D24 N1YH，爱尔兰	D24 N1YH

You can use next example how to make a request to this page api:

import requests
import pandas as pd

url = "https://geocode.search.hereapi.com/v1/geocode"

to_search = [
    "Coolboy Wicklow",
    "8 old bawn court tallaght dublin",
]

headers = {"Referer": "https://eircode-finder.com/"}
params = {
    "q": "",
    "lang": "en",
    "in": "countryCode:IRL",
    "apiKey": "BegLfP-EDdyWflI0fRrP3HJ7IDSK_0878_n2fbct1wE",
}


def get_item(q):
    params["q"] = q
    data = requests.get(url, params=params, headers=headers).json()
    out = []

    for i in data["items"]:
        out.append([i["title"], i["address"].get("postalCode")])
    return out


all_data = []
for q in to_search:
    all_data += get_item(q)

df = pd.DataFrame(all_data, columns=["title", "postal_code"])
df = df.drop_duplicates()
print(df.to_markdown(index=False))

Prints:

title	postal_code
Coolboy, Arklow, County Wicklow, Ireland
8 Old Bawn Court, Dublin, County Dublin, D24 N1YH, Ireland	D24 N1YH

回复收藏 0 原文

人事已非 2025-02-20 20:50:46

提到的网站是使用React开发的，该需要JavaScript引擎渲染HTML页面。
美丽的汤 只会发送请求，如果它的正常网站响应将是 html 强>需要JavaScript引擎才能渲染的数据。
需要使用JavaScript引擎的网站可以用硒来取消，因为它使用实际的浏览器请求和加载页面。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd

path = r"./chromedriver.exe"
driver = webdriver.Chrome(path)

url = "https://eircode-finder.com/search/"


driver.get(url)
search_input=driver.find_element_by_id("outlined")
search_input.send_keys("8 old bawn court tallaght dublin") # add text which you want to send
search_input.send_keys(Keys.ENTER)
time.sleep(10) # for page to load
eircode=driver.find_element_by_css_selector("#root > div:nth-child(2) > div > div.MuiBox-root.jss12 > div > div > div > div:nth-child(1) > div.MuiBox-root.jss13 > div > div > h3 > div")
print(eircode.text)
time.sleep(10) # buffer

# you can pass this page source to beautiful soup and scrap it
# or you can continue scrapping with selenium.
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(driver.page_source)

您可以查看此视频可以帮助您更好地理解

Mentioned website is developed using react which requires javascript engine for rendering html pages.
Beautiful Soup just sends the request and take the response if it's normal website response will be HTML or else it will be JSON data which require javascript engine to render.
Website which require javascript engine can be scrapped with selenium as it uses actual browser to request and load the page.

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd

path = r"./chromedriver.exe"
driver = webdriver.Chrome(path)

url = "https://eircode-finder.com/search/"


driver.get(url)
search_input=driver.find_element_by_id("outlined")
search_input.send_keys("8 old bawn court tallaght dublin") # add text which you want to send
search_input.send_keys(Keys.ENTER)
time.sleep(10) # for page to load
eircode=driver.find_element_by_css_selector("#root > div:nth-child(2) > div > div.MuiBox-root.jss12 > div > div > div > div:nth-child(1) > div.MuiBox-root.jss13 > div > div > h3 > div")
print(eircode.text)
time.sleep(10) # buffer

# you can pass this page source to beautiful soup and scrap it
# or you can continue scrapping with selenium.
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(driver.page_source)

You can check out this video which can help you understand better

回复收藏 0 原文

~没有更多了~