我正在编写一个打开白页,搜索一个人的姓名和位置并删除其电话号码和地址的功能。它通过:
- 导航到 whitepages.com
- 查找名称
< input; input>
并将其发送键( send_keys(perses_name)
)
- 查找位置
< input>
并发送键( send_keys(my_city)(my_city)
)
- 查找搜索按钮
< button>
并
- 在搜索结果页面上单击它,查找链接
< a>
到该
- 人页面上的人的页面,查找并返回并返回 。
当我在名称列表上以循环运行该功能时,该功能在第一次迭代中成功运行,但第二个功能不在第二次 出于测试目的,我正在用头/GUI运行WebDriver,以便我可以验证正在发生的事情,并且似乎在第二个迭代中,该功能成功地找到了名称< input> 但没有通过send_keys()输入人的名字,然后成功找到位置< input>
并成功地输入位置,然后成功找到和单击()
s搜索按钮。
由于必须在名称< input>
中有一个名称。要完成搜索,不会发生搜索,并且在名称< input>
中以红色文本出现了。需要姓氏”(这就是我当然知道 send_keys()
失败),然后当程序试图找到一个搜索结果元素时由于没有加载搜索结果页面,因此不存在。
(默认情况下,WhitePages在尝试点击搜索时拒绝对程序的访问;
那么,可能导致 send_keys()
失败的情况可能会发生什么,我该如何修复它?
完整代码:
from selenium import webdriver
# for passing a URL as a service object into the Chrome webdriver initializing method
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
# for clicking buttons
from selenium.webdriver.common.action_chains import ActionChains
# raised when using find_element() and no element exists matching the given criteria
from selenium.common.exceptions import NoSuchElementException
# for specifying to run the browser headless (w/o UI) and to surpress warnings in console output
from selenium.webdriver.chrome.options import Options
# for choosing an option from a dropdown menu
from selenium.webdriver.support.select import Select
def scrape_individual_info_wp(driver, individual_name, city_state):
# FIND INDIVIDUAL ON WHITEPAGES & NAVIGATE TO THEIR INDIVIDUAL PAGE
driver.get('https://www.whitepages.com/')
# find name input
driver.find_element(By.XPATH, "//form/div/div/div/div/input").send_keys(individual_name) # attempt to find the input *relatively*
# find location input
driver.find_element(By.XPATH, "//form/div/div/following-sibling::div/div/div/input").send_keys(city_state)
# find & click search button
driver.find_element(By.XPATH, "//form/div/div/button").click()
# FIND INDIVIDUAL IN SEARCH RESULTS
# click (first) free search result link
driver.find_element(By.XPATH, "//div[@class='results-container']/a").click()
# SCRAPE PERSON'S INFO
landline = driver.find_element(By.XPATH, "//div[contains(text(),'Landlines')]/following-sibling::div/a").text.strip()
address_info = driver.find_element(By.XPATH, "//p[contains(text(),'Current Address')]/parent::div/div/div/div/a").text.strip().split('\n')
address = address_info[0]
city_state_zip = address_info[1]
return [driver.current_url, address, city_state_zip, landline]
# selenium webdriver setup
options = webdriver.ChromeOptions()
# for the webdriver; suppresses warnings in terminal
options.add_experimental_option('excludeSwitches', ['enable-logging'])
# options.add_argument("--headless")
# options.add_argument('--disable-gpu')
# options.add_argument('--no-sandbox')
options.add_argument('--start-maximized')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
# below, you provide the path to the WebDriver for the browser of your choice, not the path to the browser .exe itself
# the WebDriver is a browser extension that you must install in order for Selenium to work with that browser
driver = webdriver.Chrome(service=Service(r'C:\Users\Owner\OneDrive\Documents\Gray Property Group\Prospecting\Python\Selenium WebDriver for Chrome\chromedriver.exe'), options=options)
driver.implicitly_wait(10)
# driver.maximize_window()
from time import sleep
names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']
for name in names:
print(name + ':')
individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')
for field in individual_info:
print('\t' + field)
print('\n')
driver.quit()
输出:
Kevin J Haggerty:
https://www.whitepages.com/name/Kevin-J-Haggerty/Bedford-NH/PLyZ4BaGl8Q
26 Southgate Dr
Bedford, NH 03110
(603) 262-9114
Patricia B Halliday:
Traceback (most recent call last):
(...)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class='results-container']/a"}
browder的屏幕截图(请参阅箭头/红色文本):
I'm writing a function that opens WhitePages, searches a person's name and location, and scrapes their phone number and address. It does this by:
- Navigating to whitepages.com
- Finding the name
<input>
and sending it keys (send_keys(persons_name)
)
- Finding the location
<input>
and sending it keys (send_keys(my_city)
)
- Finding the search button
<button>
and clicking it
- On the search results page, finding the link
<a>
to the person's page
- On the person's page, finding and returning the person's landline and address
When I run the function in a loop on a list of names, the function runs successfully on the first iteration, but not the second. For testing purposes, I'm running the WebDriver with a head/GUI so that I can verify what is going on, and it seems as though on the second iteration, the function successfully finds the name <input>
but doesn't input the person's name via send_keys(), then successfully finds the location <input>
and successfully inputs the location, then successfully finds and click()
s the search button.
Since there must be a name in the name <input>
for a search to be done, no search occurs and red text under the name <input>
appears saying "Last name is required" (that's how I know for sure send_keys()
is failing), and then I get a NoSuchElementException
when the program tries to find a search result element that doesn't exist since no search results page was loaded.
(Note: by default, WhitePages denies access to the program when trying to hit search; the options.add_argument('--disable-blink-features=AutomationControlled')
in the code below circumvents that.)
So, what may be happening that is causing send_keys()
to fail, and how do I fix it?
Full code:
from selenium import webdriver
# for passing a URL as a service object into the Chrome webdriver initializing method
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
# for clicking buttons
from selenium.webdriver.common.action_chains import ActionChains
# raised when using find_element() and no element exists matching the given criteria
from selenium.common.exceptions import NoSuchElementException
# for specifying to run the browser headless (w/o UI) and to surpress warnings in console output
from selenium.webdriver.chrome.options import Options
# for choosing an option from a dropdown menu
from selenium.webdriver.support.select import Select
def scrape_individual_info_wp(driver, individual_name, city_state):
# FIND INDIVIDUAL ON WHITEPAGES & NAVIGATE TO THEIR INDIVIDUAL PAGE
driver.get('https://www.whitepages.com/')
# find name input
driver.find_element(By.XPATH, "//form/div/div/div/div/input").send_keys(individual_name) # attempt to find the input *relatively*
# find location input
driver.find_element(By.XPATH, "//form/div/div/following-sibling::div/div/div/input").send_keys(city_state)
# find & click search button
driver.find_element(By.XPATH, "//form/div/div/button").click()
# FIND INDIVIDUAL IN SEARCH RESULTS
# click (first) free search result link
driver.find_element(By.XPATH, "//div[@class='results-container']/a").click()
# SCRAPE PERSON'S INFO
landline = driver.find_element(By.XPATH, "//div[contains(text(),'Landlines')]/following-sibling::div/a").text.strip()
address_info = driver.find_element(By.XPATH, "//p[contains(text(),'Current Address')]/parent::div/div/div/div/a").text.strip().split('\n')
address = address_info[0]
city_state_zip = address_info[1]
return [driver.current_url, address, city_state_zip, landline]
# selenium webdriver setup
options = webdriver.ChromeOptions()
# for the webdriver; suppresses warnings in terminal
options.add_experimental_option('excludeSwitches', ['enable-logging'])
# options.add_argument("--headless")
# options.add_argument('--disable-gpu')
# options.add_argument('--no-sandbox')
options.add_argument('--start-maximized')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
# below, you provide the path to the WebDriver for the browser of your choice, not the path to the browser .exe itself
# the WebDriver is a browser extension that you must install in order for Selenium to work with that browser
driver = webdriver.Chrome(service=Service(r'C:\Users\Owner\OneDrive\Documents\Gray Property Group\Prospecting\Python\Selenium WebDriver for Chrome\chromedriver.exe'), options=options)
driver.implicitly_wait(10)
# driver.maximize_window()
from time import sleep
names = ['Kevin J Haggerty', 'Patricia B Halliday', 'David R Harb', 'Jeffrey E Hathway', 'Hanshin Hsieh']
for name in names:
print(name + ':')
individual_info = scrape_individual_info_wp(driver, name, 'Manchester, NH')
for field in individual_info:
print('\t' + field)
print('\n')
driver.quit()
Output:
Kevin J Haggerty:
https://www.whitepages.com/name/Kevin-J-Haggerty/Bedford-NH/PLyZ4BaGl8Q
26 Southgate Dr
Bedford, NH 03110
(603) 262-9114
Patricia B Halliday:
Traceback (most recent call last):
(...)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class='results-container']/a"}
Screenshot of browser (see arrow / red text):

发布评论
评论(3)
尝试以下代码一次。
一次导航到网站,可以使用
driver.back()
返回原始页面。用户明确等待等待元素出现。并可以使用
ID
或class_name
等好的定位器来找到元素。Try with below code once.
Navigate to the Website once and can use
driver.back()
to come back to the original Page.User Explicit wait to wait for the elements to appear. And can use good locators like
ID
orCLASS_NAME
to locate the elements.如果脚本运行时任何输入不是空的,则其中一个正在重新打开页面。
(页面上有太多脚本;我无法缩小它是哪一个。)
一个简单的解决方案是添加
sleep(1)
driver.get(.. 。)
。电源用法
睡眠(1)
显着放慢了脚本,尤其是在通过许多name
循环时。取而代之的是,我们可以交替使用两个选项卡,并为下一个隔离迭代准备每个选项卡。
循环前的设置:
处理每个
name
>:注释out
driver.get(...) >函数:
One of the scripts is reopening the page if any input is not empty when the script runs.
(There are too many scripts on the page; I was not able to narrow down which one it is.)
A simple solution is to add
sleep(1)
afterdriver.get(...)
.Power usage
sleep(1)
significantly slows down the script, especially if looping through manynames
.Instead, we can use two tabs alternately and prepare each tab for the next-next iteration.
Setup before loop:
Prepare for the next and next-next iterations after processing each
name
:Comment out
driver.get(...)
inscrape_individual_info_wp
function:您可以为每个人创建一个新的驱动程序,因此在每次迭代中,您将从主页开始并导航到所需的页面。
我使用python中的硒做了这些事情,这是刮擦多页时的常见方法。
You may create a new driver for each person, so in each iteration you will start off from the home page and navigate to your desired pages.
I did these kinds of things using Selenium in Python, and this was my common approach when scraping multiple pages.