Python - 如何通过多个Google网站搜索电子邮件地址

发布于 2025-01-09 02:12:39 字数 1202 浏览 6 评论 0原文

我正在尝试检索在网络上搜索的不同公司的一些电子邮件地址。我有一个包含公司名称的 Excel 文件，我想出了一个小脚本，可以

在 Google 上并排搜索每个名称到“电子邮件”，然后尝试单击
解析网页的第一个 Google 结果以查找与正则表达式“*@*”。这意味着：在页面中查找包含“[电子邮件受保护]”的任何内容（例如[email protected]）并
最终提取测试并将其存储在列表中。

不幸的是，当我尝试点击第一个 Google 结果时，我陷入了第 1 点。这是代码：

from selenium import webdriver 
import pandas as pd
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

g = webdriver.Chrome()
df = pd.read_excel(path)
for i in range(len(df['Company name'])):
      g.get("https://www.google.com/search?q=" + df['Company name'][i] + " email")
      cookies_accept = ActionChains(g)
      cookies_accept.send_keys(Keys.TAB*7).send_keys(Keys.ENTER).perform()
      results = g.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/div') 
      #this xpath does not work properly with each one of the query results page.

有关如何继续的任何提示吗？ TIA

原文

I am trying to retrieve some email addresses of different companies searching on the web.
I have an Excel file with companies' names and I came up with a little script that

searches every single name on Google sid-by-side to " email" and then trying to click the first Google result
parsing the webpage to find a match with the regex " * @ * ." that means: find anything in the page that contains "[email protected]" (e.g. [email protected]) and
eventually exctract the test and store it in a list.

Unfortunately i'm stuck at point 1 when trying to click on every first Google result.
Here's the code:

from selenium import webdriver 
import pandas as pd
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

g = webdriver.Chrome()
df = pd.read_excel(path)
for i in range(len(df['Company name'])):
      g.get("https://www.google.com/search?q=" + df['Company name'][i] + " email")
      cookies_accept = ActionChains(g)
      cookies_accept.send_keys(Keys.TAB*7).send_keys(Keys.ENTER).perform()
      results = g.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/div') 
      #this xpath does not work properly with each one of the query results page.

Any hints on how to continue?
TIA

分享到QQ

分享到微博