python selenium chrome -send_keys()在刮擦白页时第二次迭代期间未发送键

发布于 2025-02-07 15:22:06 字数 5054 浏览 5 评论 0 原文

我正在编写一个打开白页,搜索一个人的姓名和位置并删除其电话号码和地址的功能。它通过:

  1. 导航到 whitepages.com
  2. 查找名称 < input; input> 并将其发送键( send_keys(perses_name)
  3. 查找位置< input> 并发送键( send_keys(my_city)(my_city)
  4. 查找搜索按钮< button>
  5. 在搜索结果页面上单击它,查找链接< a> 到该
  6. 人页面上的人的页面,查找并返回并返回 。

当我在名称列表上以循环运行该功能时,该功能在第一次迭代中成功运行,但第二个功能不在第二次 出于测试目的,我正在用头/GUI运行WebDriver,以便我可以验证正在发生的事情,并且似乎在第二个迭代中,该功能成功地找到了名称< input> 但没有通过send_keys()输入人的名字,然后成功找到位置< input> 并成功地输入位置,然后成功找到和单击() s搜索按钮。

由于必须在名称< input> 中有一个名称。要完成搜索,不会发生搜索,并且在名称< input> 中以红色文本出现了。需要姓氏”(这就是我当然知道 send_keys()失败),然后当程序试图找到一个搜索结果元素时由于没有加载搜索结果页面,因此不存在。

(默认情况下,WhitePages在尝试点击搜索时拒绝对程序的访问;

那么,可能导致 send_keys()失败的情况可能会发生什么,我该如何修复它?

完整代码:

from selenium import webdriver
# for passing a URL as a service object into the Chrome webdriver initializing method
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
# for clicking buttons
from selenium.webdriver.common.action_chains import ActionChains
# raised when using find_element() and no element exists matching the given criteria
from selenium.common.exceptions import NoSuchElementException
# for specifying to run the browser headless (w/o UI) and to surpress warnings in console output
from selenium.webdriver.chrome.options import Options
# for choosing an option from a dropdown menu
from selenium.webdriver.support.select import Select

def scrape_individual_info_wp(driver, individual_name, city_state):

    # FIND INDIVIDUAL ON WHITEPAGES & NAVIGATE TO THEIR INDIVIDUAL PAGE

    driver.get('https://www.whitepages.com/')

    # find name input
    driver.find_element(By.XPATH, "//form/div/div/div/div/input").send_keys(individual_name) # attempt to find the input *relatively*
    
    # find location input
    driver.find_element(By.XPATH, "//form/div/div/following-sibling::div/div/div/input").send_keys(city_state)

    # find & click search button
    driver.find_element(By.XPATH, "//form/div/div/button").click()

    # FIND INDIVIDUAL IN SEARCH RESULTS

    # click (first) free search result link
    driver.find_element(By.XPATH, "//div[@class='results-container']/a").click()


    # SCRAPE PERSON'S INFO
    
    landline = driver.find_element(By.XPATH, "//div[contains(text(),'Landlines')]/following-sibling::div/a").text.strip()
    address_info = driver.find_element(By.XPATH, "//p[contains(text(),'Current Address')]/parent::div/div/div/div/a").text.strip().split('\n')

    address = address_info[0]
    city_state_zip = address_info[1]

    return [driver.current_url, address, city_state_zip, landline]



# selenium webdriver setup
options = webdriver.ChromeOptions()

# for the webdriver; suppresses warnings in terminal
options.add_experimental_option('excludeSwitches', ['enable-logging'])

# options.add_argument("--headless")
# options.add_argument('--disable-gpu')
# options.add_argument('--no-sandbox')
options.add_argument('--start-maximized')

options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

# below, you provide the path to the WebDriver for the browser of your choice, not the path to the browser .exe itself
# the WebDriver is a browser extension that you must install in order for Selenium to work with that browser
driver = webdriver.Chrome(service=Service(r'C:\Users\Owner\OneDrive\Documents\Gray Property Group\Prospecting\Python\Selenium WebDriver for Chrome\chromedriver.exe'), options=options)
driver.implicitly_wait(10)

# driver.maximize_window()

from time import sleep

names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']

for name in names:

    print(name + ':')

    individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')

    for field in individual_info:

        print('\t' + field)

    print('\n')

driver.quit()

输出:

Kevin J Haggerty:
        https://www.whitepages.com/name/Kevin-J-Haggerty/Bedford-NH/PLyZ4BaGl8Q
        26 Southgate Dr  
        Bedford, NH 03110
        (603) 262-9114   


Patricia B Halliday:     
Traceback (most recent call last):

(...)

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class='results-container']/a"}

browder的屏幕截图(请参阅箭头/红色文本):

I'm writing a function that opens WhitePages, searches a person's name and location, and scrapes their phone number and address. It does this by:

  1. Navigating to whitepages.com
  2. Finding the name <input> and sending it keys (send_keys(persons_name))
  3. Finding the location <input> and sending it keys (send_keys(my_city))
  4. Finding the search button <button> and clicking it
  5. On the search results page, finding the link <a> to the person's page
  6. On the person's page, finding and returning the person's landline and address

When I run the function in a loop on a list of names, the function runs successfully on the first iteration, but not the second. For testing purposes, I'm running the WebDriver with a head/GUI so that I can verify what is going on, and it seems as though on the second iteration, the function successfully finds the name <input> but doesn't input the person's name via send_keys(), then successfully finds the location <input> and successfully inputs the location, then successfully finds and click()s the search button.

Since there must be a name in the name <input> for a search to be done, no search occurs and red text under the name <input> appears saying "Last name is required" (that's how I know for sure send_keys() is failing), and then I get a NoSuchElementException when the program tries to find a search result element that doesn't exist since no search results page was loaded.

(Note: by default, WhitePages denies access to the program when trying to hit search; the options.add_argument('--disable-blink-features=AutomationControlled') in the code below circumvents that.)

So, what may be happening that is causing send_keys() to fail, and how do I fix it?

Full code:

from selenium import webdriver
# for passing a URL as a service object into the Chrome webdriver initializing method
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
# for clicking buttons
from selenium.webdriver.common.action_chains import ActionChains
# raised when using find_element() and no element exists matching the given criteria
from selenium.common.exceptions import NoSuchElementException
# for specifying to run the browser headless (w/o UI) and to surpress warnings in console output
from selenium.webdriver.chrome.options import Options
# for choosing an option from a dropdown menu
from selenium.webdriver.support.select import Select

def scrape_individual_info_wp(driver, individual_name, city_state):

    # FIND INDIVIDUAL ON WHITEPAGES & NAVIGATE TO THEIR INDIVIDUAL PAGE

    driver.get('https://www.whitepages.com/')

    # find name input
    driver.find_element(By.XPATH, "//form/div/div/div/div/input").send_keys(individual_name) # attempt to find the input *relatively*
    
    # find location input
    driver.find_element(By.XPATH, "//form/div/div/following-sibling::div/div/div/input").send_keys(city_state)

    # find & click search button
    driver.find_element(By.XPATH, "//form/div/div/button").click()

    # FIND INDIVIDUAL IN SEARCH RESULTS

    # click (first) free search result link
    driver.find_element(By.XPATH, "//div[@class='results-container']/a").click()


    # SCRAPE PERSON'S INFO
    
    landline = driver.find_element(By.XPATH, "//div[contains(text(),'Landlines')]/following-sibling::div/a").text.strip()
    address_info = driver.find_element(By.XPATH, "//p[contains(text(),'Current Address')]/parent::div/div/div/div/a").text.strip().split('\n')

    address = address_info[0]
    city_state_zip = address_info[1]

    return [driver.current_url, address, city_state_zip, landline]



# selenium webdriver setup
options = webdriver.ChromeOptions()

# for the webdriver; suppresses warnings in terminal
options.add_experimental_option('excludeSwitches', ['enable-logging'])

# options.add_argument("--headless")
# options.add_argument('--disable-gpu')
# options.add_argument('--no-sandbox')
options.add_argument('--start-maximized')

options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

# below, you provide the path to the WebDriver for the browser of your choice, not the path to the browser .exe itself
# the WebDriver is a browser extension that you must install in order for Selenium to work with that browser
driver = webdriver.Chrome(service=Service(r'C:\Users\Owner\OneDrive\Documents\Gray Property Group\Prospecting\Python\Selenium WebDriver for Chrome\chromedriver.exe'), options=options)
driver.implicitly_wait(10)

# driver.maximize_window()

from time import sleep

names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']

for name in names:

    print(name + ':')

    individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')

    for field in individual_info:

        print('\t' + field)

    print('\n')

driver.quit()

Output:

Kevin J Haggerty:
        https://www.whitepages.com/name/Kevin-J-Haggerty/Bedford-NH/PLyZ4BaGl8Q
        26 Southgate Dr  
        Bedford, NH 03110
        (603) 262-9114   


Patricia B Halliday:     
Traceback (most recent call last):

(...)

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class='results-container']/a"}

Screenshot of browser (see arrow / red text):
enter image description here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

你爱我像她 2025-02-14 15:22:06

尝试以下代码一次。

一次导航到网站,可以使用 driver.back()返回原始页面。

用户明确等待等待元素出现。并可以使用 ID class_name 等好的定位器来找到元素。

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time


def scrape_individual_info_wp(driver, individual_name, city_state):
    name_field = wait.until(EC.element_to_be_clickable((By.ID, "desktopSearchBar")))
    name_field.clear() # Clear the field to enter a new name
    name_field.send_keys(individual_name)
    state_field = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "pa-3")))
    state_field.clear() # Clear the field and enter the state
    state_field.send_keys(city_state)
    search = wait.until(EC.element_to_be_clickable((By.ID, "wp-search")))
    search.click()
    time.sleep(2) # Using Sleep to make sure the data is loaded. Can use Waits to wait for the data to appear and extract the same.
    # Code to scarpe data.
    driver.back()


driver = webdriver.Chrome(service= Service("C:/expediaproject/Chromedriver/chromedriver.exe"))
driver.maximize_window()

wait = WebDriverWait(driver,30)

driver.get("https://www.whitepages.com/")

names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']
state = "Manchester, NH"

for name in names:
    scrape_individual_info_wp(driver, name, state)

driver.quit()

Try with below code once.

Navigate to the Website once and can use driver.back() to come back to the original Page.

User Explicit wait to wait for the elements to appear. And can use good locators like ID or CLASS_NAME to locate the elements.

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time


def scrape_individual_info_wp(driver, individual_name, city_state):
    name_field = wait.until(EC.element_to_be_clickable((By.ID, "desktopSearchBar")))
    name_field.clear() # Clear the field to enter a new name
    name_field.send_keys(individual_name)
    state_field = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, "pa-3")))
    state_field.clear() # Clear the field and enter the state
    state_field.send_keys(city_state)
    search = wait.until(EC.element_to_be_clickable((By.ID, "wp-search")))
    search.click()
    time.sleep(2) # Using Sleep to make sure the data is loaded. Can use Waits to wait for the data to appear and extract the same.
    # Code to scarpe data.
    driver.back()


driver = webdriver.Chrome(service= Service("C:/expediaproject/Chromedriver/chromedriver.exe"))
driver.maximize_window()

wait = WebDriverWait(driver,30)

driver.get("https://www.whitepages.com/")

names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']
state = "Manchester, NH"

for name in names:
    scrape_individual_info_wp(driver, name, state)

driver.quit()
风情万种。 2025-02-14 15:22:06

如果脚本运行时任何输入不是空的,则其中一个正在重新打开页面。
(页面上有太多脚本;我无法缩小它是哪一个。)

一个简单的解决方案是添加 sleep(1) driver.get(.. 。)

driver.get('https://www.whitepages.com/')
sleep(1)  # Add this

电源用法

睡眠(1)显着放慢了脚本,尤其是在通过许多 name 循环时。

取而代之的是,我们可以交替使用两个选项卡,并为下一个隔离迭代准备每个选项卡。

循环前的设置:

driver.get('https://www.whitepages.com/')
driver.execute_script('window.open("https://www.whitepages.com/")')
driver.switch_to.window(driver.window_handles[0])
next_index = 1

处理每个 name >:

for name in names:

    ...

    # Prepare this tab for next-next iteration
    driver.get('https://www.whitepages.com/')
    # Switch to another tab for next iteration
    driver.switch_to.window(driver.window_handles[next_index])
    next_index = 1 - next_index

注释out driver.get(...) >函数:

# driver.get('https://www.whitepages.com/')

One of the scripts is reopening the page if any input is not empty when the script runs.
(There are too many scripts on the page; I was not able to narrow down which one it is.)

A simple solution is to add sleep(1) after driver.get(...).

driver.get('https://www.whitepages.com/')
sleep(1)  # Add this

Power usage

sleep(1) significantly slows down the script, especially if looping through many names.

Instead, we can use two tabs alternately and prepare each tab for the next-next iteration.

Setup before loop:

driver.get('https://www.whitepages.com/')
driver.execute_script('window.open("https://www.whitepages.com/")')
driver.switch_to.window(driver.window_handles[0])
next_index = 1

Prepare for the next and next-next iterations after processing each name:

for name in names:

    ...

    # Prepare this tab for next-next iteration
    driver.get('https://www.whitepages.com/')
    # Switch to another tab for next iteration
    driver.switch_to.window(driver.window_handles[next_index])
    next_index = 1 - next_index

Comment out driver.get(...) in scrape_individual_info_wp function:

# driver.get('https://www.whitepages.com/')
赠我空喜 2025-02-14 15:22:06

您可以为每个人创建一个新的驱动程序,因此在每次迭代中,您将从主页开始并导航到所需的页面。

我使用python中的硒做了这些事情,这是刮擦多页时的常见方法。


names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']

for name in names:

    print(name + ':')

    driver = webdriver.Chrome(service=Service("C:/expediaproject/Chromedriver/chromedriver.exe"))

    wait = WebDriverWait(driver,30)

    driver.get("https://www.whitepages.com/")

    individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')

    for field in individual_info:

        print('\t' + field)

    print('\n')

    driver.quit()

You may create a new driver for each person, so in each iteration you will start off from the home page and navigate to your desired pages.

I did these kinds of things using Selenium in Python, and this was my common approach when scraping multiple pages.


names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']

for name in names:

    print(name + ':')

    driver = webdriver.Chrome(service=Service("C:/expediaproject/Chromedriver/chromedriver.exe"))

    wait = WebDriverWait(driver,30)

    driver.get("https://www.whitepages.com/")

    individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')

    for field in individual_info:

        print('\t' + field)

    print('\n')

    driver.quit()

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文