如何使用 Python Mechanize 自动添加 Google 快讯
我知道这里有一个 Python API 出售(http://oktaykilic.com/my-projects/google-alerts-api-python/),但我想了解为什么我现在这样做不工作。
这是我到目前为止所得到的:
class GAlerts():
def __init__(self, uName = 'USERNAME', passWord = 'PASSWORD'):
self.uName = uName
self.passWord = passWord
def addAlert(self):
self.cj = mechanize.CookieJar()
loginURL = 'https://www.google.com/accounts/ServiceLogin?hl=en&service=alerts&continue=http://www.google.com/alerts'
alertsURL = 'http://www.google.com/alerts'
#log into google
initialRequest = mechanize.Request(loginURL)
response = mechanize.urlopen(initialRequest)
#put in form info
forms = ClientForm.ParseResponse(response, backwards_compat=False)
forms[0]['Email'] = self.uName
forms[0]['Passwd'] = self.passWord
#click form and get cookies
request2 = forms[0].click()
response2 = mechanize.urlopen(request2)
self.cj.extract_cookies(response, initialRequest)
#now go to alerts page with cookies
request3 = mechanize.Request(alertsURL)
self.cj.add_cookie_header(request3)
response3 = mechanize.urlopen(request3)
#parse forms on this page
formsAdd = ClientForm.ParseResponse(response3, backwards_compat=False)
formsAdd[0]['q'] = 'Hines Ward'
#click it and submit
request4 = formsAdd[0].click()
self.cj.add_cookie_header(request4)
response4 = mechanize.urlopen(request4)
print response4.read()
myAlerter = GAlerts()
myAlerter.addAlert()
据我所知,它成功登录并进入添加警报主页,但是当我输入查询并“单击”提交时,它会将我发送到一个页面,上面写着“请输入合法的邮件地址”。我缺少某种身份验证吗?我也不明白如何更改谷歌自定义下拉菜单上的值?有什么想法吗?
谢谢
I'm aware of a Python API for sale here (http://oktaykilic.com/my-projects/google-alerts-api-python/), but I'd like to understand why the way I'm doing it now isn't working.
Here is what I have so far:
class GAlerts():
def __init__(self, uName = 'USERNAME', passWord = 'PASSWORD'):
self.uName = uName
self.passWord = passWord
def addAlert(self):
self.cj = mechanize.CookieJar()
loginURL = 'https://www.google.com/accounts/ServiceLogin?hl=en&service=alerts&continue=http://www.google.com/alerts'
alertsURL = 'http://www.google.com/alerts'
#log into google
initialRequest = mechanize.Request(loginURL)
response = mechanize.urlopen(initialRequest)
#put in form info
forms = ClientForm.ParseResponse(response, backwards_compat=False)
forms[0]['Email'] = self.uName
forms[0]['Passwd'] = self.passWord
#click form and get cookies
request2 = forms[0].click()
response2 = mechanize.urlopen(request2)
self.cj.extract_cookies(response, initialRequest)
#now go to alerts page with cookies
request3 = mechanize.Request(alertsURL)
self.cj.add_cookie_header(request3)
response3 = mechanize.urlopen(request3)
#parse forms on this page
formsAdd = ClientForm.ParseResponse(response3, backwards_compat=False)
formsAdd[0]['q'] = 'Hines Ward'
#click it and submit
request4 = formsAdd[0].click()
self.cj.add_cookie_header(request4)
response4 = mechanize.urlopen(request4)
print response4.read()
myAlerter = GAlerts()
myAlerter.addAlert()
As far as I can tell, it successfully logs in and gets to the adding alerts homepage, but when I enter a query and "click" submit it sends me to a page that says "Please enter a valid e-mail address". Is there some kind of authentication I'm missing? I also don't understand how to change the values on google's custom drop-down menus? Any ideas?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
自定义下拉菜单是使用 JavaScript 完成的,因此正确的解决方案是找出 URL 参数,然后尝试重现它们(这可能是它现在无法按预期工作的原因 - 您省略了所需的 URL当您在浏览器中访问站点时通常由 JavaScript 设置的参数)。
懒惰的解决方案是使用
galerts
库,看起来它完全满足您的需要。对于涉及
mechanize
(或一般的屏幕抓取)的未来项目的一些提示:set_proxies
才能将其与 Fiddler 一起使用,请参阅文档)for f in self.forms(): print f
的操作>。这会向您显示页面上机械化识别的所有表单及其名称。self.set_cookiejar(cookielib.CookieJar())
。这会自动跟踪 cookie。The custom drop-down menus are done using JavaScript, so the proper solution would be to figure out the URL parameters and then try to reproduce them (this might be the reason it doesn't works as expected right now - you are omitting required URL parameters that are normally set by JavaScript when you visit the site in a browser).
The lazy solution is to use the
galerts
library, it looks like it does exactly what you need.A few hints for future projects involving
mechanize
(or screen-scraping in general):set_proxies
on your browser object to use it with Fiddler, see documentation)for f in self.forms(): print f
. This shows you all forms mechanize recognized on a page, along with their name.self.set_cookiejar(cookielib.CookieJar())
. This keeps track of cookies automatically.lxml
has a very good implementation).Mechanize 不处理 JavaScript,那些下拉菜单是 JS。如果您想在涉及 JavaScript 的情况下实现自动化,我建议使用 Selenium,它也具有 Python 绑定。
http://seleniumhq.org/
Mechanize doesn't handle JavaScript, and those drop-down Menus are JS. If you want to do automatization where JavaScript is involved, I suggest using Selenium, which also has Python bindings.
http://seleniumhq.org/