如何使用 Python Mechanize 自动添加 Google 快讯

发布于 2024-12-01 15:04:46 字数 1631 浏览 0 评论 0原文

我知道这里有一个 Python API 出售（http://oktaykilic.com/my-projects/google-alerts-api-python/），但我想了解为什么我现在这样做不工作。

这是我到目前为止所得到的：

class GAlerts():

def __init__(self, uName = 'USERNAME', passWord = 'PASSWORD'):

    self.uName = uName
    self.passWord = passWord

def addAlert(self):

    self.cj = mechanize.CookieJar()
    loginURL = 'https://www.google.com/accounts/ServiceLogin?hl=en&service=alerts&continue=http://www.google.com/alerts'
    alertsURL = 'http://www.google.com/alerts'

    #log into google
    initialRequest = mechanize.Request(loginURL)
    response = mechanize.urlopen(initialRequest)

    #put in form info
    forms = ClientForm.ParseResponse(response, backwards_compat=False)
    forms[0]['Email'] = self.uName
    forms[0]['Passwd'] = self.passWord

    #click form and get cookies
    request2 = forms[0].click()
    response2 = mechanize.urlopen(request2)
    self.cj.extract_cookies(response, initialRequest)


    #now go to alerts page with cookies
    request3 = mechanize.Request(alertsURL)
    self.cj.add_cookie_header(request3)
    response3 = mechanize.urlopen(request3)

    #parse forms on this page
    formsAdd = ClientForm.ParseResponse(response3, backwards_compat=False)
    formsAdd[0]['q'] = 'Hines Ward'

    #click it and submit
    request4 = formsAdd[0].click()
    self.cj.add_cookie_header(request4)
    response4 = mechanize.urlopen(request4)
    print response4.read()


myAlerter = GAlerts()
myAlerter.addAlert()

据我所知，它成功登录并进入添加警报主页，但是当我输入查询并“单击”提交时，它会将我发送到一个页面，上面写着“请输入合法的邮件地址”。我缺少某种身份验证吗？我也不明白如何更改谷歌自定义下拉菜单上的值？有什么想法吗？

谢谢

原文

I'm aware of a Python API for sale here (http://oktaykilic.com/my-projects/google-alerts-api-python/), but I'd like to understand why the way I'm doing it now isn't working.

Here is what I have so far:

class GAlerts():

def __init__(self, uName = 'USERNAME', passWord = 'PASSWORD'):

    self.uName = uName
    self.passWord = passWord

def addAlert(self):

    self.cj = mechanize.CookieJar()
    loginURL = 'https://www.google.com/accounts/ServiceLogin?hl=en&service=alerts&continue=http://www.google.com/alerts'
    alertsURL = 'http://www.google.com/alerts'

    #log into google
    initialRequest = mechanize.Request(loginURL)
    response = mechanize.urlopen(initialRequest)

    #put in form info
    forms = ClientForm.ParseResponse(response, backwards_compat=False)
    forms[0]['Email'] = self.uName
    forms[0]['Passwd'] = self.passWord

    #click form and get cookies
    request2 = forms[0].click()
    response2 = mechanize.urlopen(request2)
    self.cj.extract_cookies(response, initialRequest)


    #now go to alerts page with cookies
    request3 = mechanize.Request(alertsURL)
    self.cj.add_cookie_header(request3)
    response3 = mechanize.urlopen(request3)

    #parse forms on this page
    formsAdd = ClientForm.ParseResponse(response3, backwards_compat=False)
    formsAdd[0]['q'] = 'Hines Ward'

    #click it and submit
    request4 = formsAdd[0].click()
    self.cj.add_cookie_header(request4)
    response4 = mechanize.urlopen(request4)
    print response4.read()


myAlerter = GAlerts()
myAlerter.addAlert()

As far as I can tell, it successfully logs in and gets to the adding alerts homepage, but when I enter a query and "click" submit it sends me to a page that says "Please enter a valid e-mail address". Is there some kind of authentication I'm missing? I also don't understand how to change the values on google's custom drop-down menus? Any ideas?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尤怨 2024-12-08 15:04:46

自定义下拉菜单是使用 JavaScript 完成的，因此正确的解决方案是找出 URL 参数，然后尝试重现它们（这可能是它现在无法按预期工作的原因 - 您省略了所需的 URL当您在浏览器中访问站点时通常由 JavaScript 设置的参数）。

懒惰的解决方案是使用 galerts 库，看起来它完全满足您的需要。

对于涉及 mechanize （或一般的屏幕抓取）的未来项目的一些提示：

使用 Fiddler，一个非常有用的HTTP调试工具。它捕获来自大多数浏览器的 HTTP 流量，并允许您查看浏览器到底请求什么。然后，您可以手动制定所需的请求，如果它不起作用，您只需进行比较即可。 Firebug 或Google Chrome 的开发者工具等工具也能派上用场，尤其是对于大量异步请求。（您必须在浏览器对象上调用 set_proxies 才能将其与 Fiddler 一起使用，请参阅文档）
出于调试目的，请执行类似 for f in self.forms(): print f 的操作>。这会向您显示页面上机械化识别的所有表单及其名称。
处理 cookie 是重复性的，所以 - 令人惊讶！ - 有一种简单的方法可以实现自动化。只需在浏览器类构造函数中执行此操作：self.set_cookiejar(cookielib.CookieJar())。这会自动跟踪 cookie。
我长期以来一直依赖像 BeautifulSoup 这样的自定义解析（并且我仍然在某些特殊情况下使用它），但在大多数情况下，网页屏幕抓取的最快方法是使用 XPath （例如，< code>lxml 有一个非常好的实现）。