Python - 如何通过多个Google网站搜索电子邮件地址
我正在尝试检索在网络上搜索的不同公司的一些电子邮件地址。 我有一个包含公司名称的 Excel 文件,我想出了一个小脚本,可以
- 在 Google 上并排搜索每个名称到“电子邮件”,然后尝试单击
- 解析网页的第一个 Google 结果以查找与正则表达式“*@*”。这意味着:在页面中查找包含“[电子邮件受保护]”的任何内容(例如[email protected])并
- 最终提取测试并将其存储在列表中。
不幸的是,当我尝试点击第一个 Google 结果时,我陷入了第 1 点。 这是代码:
from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
g = webdriver.Chrome()
df = pd.read_excel(path)
for i in range(len(df['Company name'])):
g.get("https://www.google.com/search?q=" + df['Company name'][i] + " email")
cookies_accept = ActionChains(g)
cookies_accept.send_keys(Keys.TAB*7).send_keys(Keys.ENTER).perform()
results = g.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/div')
#this xpath does not work properly with each one of the query results page.
有关如何继续的任何提示吗? TIA
I am trying to retrieve some email addresses of different companies searching on the web.
I have an Excel file with companies' names and I came up with a little script that
- searches every single name on Google sid-by-side to " email" and then trying to click the first Google result
- parsing the webpage to find a match with the regex " * @ * ." that means: find anything in the page that contains "[email protected]" (e.g. [email protected]) and
- eventually exctract the test and store it in a list.
Unfortunately i'm stuck at point 1 when trying to click on every first Google result.
Here's the code:
from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
g = webdriver.Chrome()
df = pd.read_excel(path)
for i in range(len(df['Company name'])):
g.get("https://www.google.com/search?q=" + df['Company name'][i] + " email")
cookies_accept = ActionChains(g)
cookies_accept.send_keys(Keys.TAB*7).send_keys(Keys.ENTER).perform()
results = g.find_elements_by_xpath('//*[@id="rso"]/div/div/div/div/div')
#this xpath does not work properly with each one of the query results page.
Any hints on how to continue?
TIA
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题可能是 Google 结果的格式不同。有些只显示主页的链接,有些还显示几个子页面。以下是搜索示例:
如果您的方法已经对某些结果起作用,那么您就走在正确的道路上。解决方法可能是查看不同的格式,然后包含一些
try
except
逻辑来检查每种结果格式,即包含第一个和第一个结果的单独 xpath屏幕截图中的第二个“Windows”搜索结果。The problem might be that Google results come in different formats. Some just show the link to the homepage, others also show several sub-pages. Here's an example search:
If your approach already works for some of the results, then you are on the right track. A fix could be to take a look at the different formats and then include some
try
except
logic to check every result format, i.e. including separate xpaths for a result of the first and the second "Windows" search result in the screenshot.