如何使用 python、beautifulsoup 将 Excel 工作表的名称拆分为 3 个单元格

发布于 2025-01-11 07:22:33 字数 3181 浏览 0 评论 0原文

我正在尝试刮掉名字并将它们导入到 Excel 工作表中以供以后使用。问题是我需要它们在 3 个不同的单元格中,firstlastinitial。该脚本查找关键字(在本例中为 est of)并打印整行,其中包含全名和“est of”。我需要它:

  1. 从末尾删除 est 。
  2. 将全名拆分为 3 个,以便可以将其导出到工作表中。

代码如下:

#!python
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from random import randint
import pickle
import datetime
import os
import time
import sys
import openpyxl
from openpyxl import Workbook
import re

url = 'https://www.miamidade.gov/global/home.page'

current_time = datetime.datetime.now()
current_time.strftime("%m/%d/%Y")
options = webdriver.ChromeOptions()
options.headless = True
chromedriver = "chromedriver.exe"
number = "2080"
driver = webdriver.Chrome(chromedriver) #chromedriver
driver.get(url)
pickle.dump(driver.get_cookies() , open("cookies.pkl","wb"))
time.sleep(3)
nav1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/nav/div/div[1]/div/div[1]/a').click()
time.sleep(1)
nav2 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/button').click()
propsrch1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/ul/li[2]/ul/li[5]/a').click()

time.sleep(2)
propsrch2 = driver.find_element_by_xpath('/html/body/div[2]/div/main/div[2]/div/div[2]/div/div[1]/div[1]/ul/li[1]/span/a').click()
time.sleep(5)



subdivision = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/ul/li[3]/a').click()
searchbar = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/input')
time.sleep(2)
searchbar.send_keys("RICHMOND HGTS")
search = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/span/button/span').click()
time.sleep(10)
table = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[4]/a').click()
main_window_handle = None
while not main_window_handle:
    main_window_handle = driver.current_window_handle
#driver.find_element_by_xpath(u'//a[text()="click here"]').click()
signin_window_handle = None
while not signin_window_handle:
    for handle in driver.window_handles:
        if handle != main_window_handle:
            signin_window_handle = handle
            break
driver.switch_to.window(signin_window_handle)
time.sleep(20)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')

keyword = 'est of'
#keywords = soup.find(keyword)
counts = soup.find_all(text=re.compile("EST OF"))
for count in counts:
    print(count)

现在它打印到cmd中,这样我就可以看到它的工作原理。 看起来像这样:

GRACE K ROLLE EST OF    
ETHEL H FIFE EST OF 
BARBARA J BROUSSARD EST OF  
CLEMENTINA D RAHMING EST OF 
CHARLES B  CAMBRIDGE JR EST OF  
EMILY STATEN EST OF 
HATTIE S KING  EST OF   

拆分名称的最佳方法是什么?

I'm trying to scrape names off and import them into a excel sheet for them to be used later. Issue is i need them in 3 different cells, first,last and initial. The script looks for a keyword in this case its est of and prints the whole line, which has the the full name along with the "est of". I need it to:

  1. Drop the est of from the end.
  2. Split the full name into 3 so it can be exported out to a sheet.

Heres the code:

#!python
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import NoSuchElementException
from random import randint
import pickle
import datetime
import os
import time
import sys
import openpyxl
from openpyxl import Workbook
import re

url = 'https://www.miamidade.gov/global/home.page'

current_time = datetime.datetime.now()
current_time.strftime("%m/%d/%Y")
options = webdriver.ChromeOptions()
options.headless = True
chromedriver = "chromedriver.exe"
number = "2080"
driver = webdriver.Chrome(chromedriver) #chromedriver
driver.get(url)
pickle.dump(driver.get_cookies() , open("cookies.pkl","wb"))
time.sleep(3)
nav1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/nav/div/div[1]/div/div[1]/a').click()
time.sleep(1)
nav2 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/button').click()
propsrch1 = driver.find_element_by_xpath('/html/body/div[2]/div/div[1]/div/header/div[2]/div[2]/div/div/div/ul/li[1]/ul/li[2]/ul/li[5]/a').click()

time.sleep(2)
propsrch2 = driver.find_element_by_xpath('/html/body/div[2]/div/main/div[2]/div/div[2]/div/div[1]/div[1]/ul/li[1]/span/a').click()
time.sleep(5)



subdivision = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/ul/li[3]/a').click()
searchbar = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/input')
time.sleep(2)
searchbar.send_keys("RICHMOND HGTS")
search = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[2]/div/div[3]/div/span/button/span').click()
time.sleep(10)
table = driver.find_element_by_xpath('/html/body/div/div[2]/div[3]/div[1]/div[2]/div[4]/a').click()
main_window_handle = None
while not main_window_handle:
    main_window_handle = driver.current_window_handle
#driver.find_element_by_xpath(u'//a[text()="click here"]').click()
signin_window_handle = None
while not signin_window_handle:
    for handle in driver.window_handles:
        if handle != main_window_handle:
            signin_window_handle = handle
            break
driver.switch_to.window(signin_window_handle)
time.sleep(20)
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')

keyword = 'est of'
#keywords = soup.find(keyword)
counts = soup.find_all(text=re.compile("EST OF"))
for count in counts:
    print(count)

Right now its printing into the cmd, just so i can see that its working.
which looks like this:

GRACE K ROLLE EST OF    
ETHEL H FIFE EST OF 
BARBARA J BROUSSARD EST OF  
CLEMENTINA D RAHMING EST OF 
CHARLES B  CAMBRIDGE JR EST OF  
EMILY STATEN EST OF 
HATTIE S KING  EST OF   

What is the best way to go about splitting up the name?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

櫻之舞 2025-01-18 07:22:33

您可以使用 split 方法分割以下空间

for count in counts:
    count= count.split(' ')
    First_name=counnt[0]
    mid_name=count[1]
    Last_name=count[2]

You can split following space using split methods

for count in counts:
    count= count.split(' ')
    First_name=counnt[0]
    mid_name=count[1]
    Last_name=count[2]
寄与心 2025-01-18 07:22:33

如果您知道它总是由空格分隔的 3 个单词,则可以使用 count.split(' ')[:3]

如果您不知道名称有多长,可以使用 count.rstrip('EST OF').split(' ')

If you know it's always going to be 3 words separated by a space, you can use count.split(' ')[:3].

If you don't know how long the name will be, you can use count.rstrip('EST OF').split(' ').

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文