将 Mechanize 与 Google 文档结合使用

发布于 2024-09-05 00:29:15 字数 680 浏览 3 评论 0原文

我正在尝试使用 Mechanize 登录 Google 文档,以便我可以抓取一些内容(不可能从 API 中获取),但在尝试遵循元重定向时,我似乎一直收到 404:

require 'rubygems'
require 'mechanize'

USERNAME = "..."
PASSWORD = "..."

LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/"

agent = Mechanize.new
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = USERNAME
login_form.Passwd = PASSWORD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts "redirect: #{redirect}"

followed_page = agent.get(redirect) # throws a HTTPNotFound exception

pp followed_page

任何人都可以看到为什么这不起作用?

I'm trying to use Mechanize login to Google Docs so that I can scrape something (not possible from the API) but I keep seem to keep getting a 404 when trying to follow the meta redirect:

require 'rubygems'
require 'mechanize'

USERNAME = "..."
PASSWORD = "..."

LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/"

agent = Mechanize.new
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = USERNAME
login_form.Passwd = PASSWORD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts "redirect: #{redirect}"

followed_page = agent.get(redirect) # throws a HTTPNotFound exception

pp followed_page

Can anyone see why this isn't working?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

故事灯 2024-09-12 00:29:16

安迪你太棒了!!
你的代码帮助我使我的脚本可行并登录谷歌帐户。几个小时后我发现了你的错误。这是关于 html 转义的。正如我发现的,Mechanize 会自动转义它作为“get”方法参数接收的 uri。所以我的解决方案是:

EMAIL  = ".."
PASSWD = ".."
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")}
agent.user_agent_alias = 'Linux Mozilla'
agent.open_timeout = 3
agent.read_timeout = 4
agent.keep_alive   = true
agent.redirect_ok  = true
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en"

login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = EMAIL
login_form.Passwd = PASSWD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/"
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner")
pp followed_page

这对我来说效果很好。我已将元标记(已转义)中的 continue 参数替换为新参数。

Andy you're awesome!!
Your code helped me to make my script workable and to login into google account. I found your error after couple of hours.It was about html escaping. As I found,Mechanize automatically escapes uri it recieves as a parameter for 'get' method. So my solution is:

EMAIL  = ".."
PASSWD = ".."
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")}
agent.user_agent_alias = 'Linux Mozilla'
agent.open_timeout = 3
agent.read_timeout = 4
agent.keep_alive   = true
agent.redirect_ok  = true
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en"

login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = EMAIL
login_form.Passwd = PASSWD
login_response_page = agent.submit(login_form)

redirect = login_response_page.meta[0].uri.to_s

puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/"
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner")
pp followed_page

This works just fine for me. I have replaced continue parameter from the meta tag (which is already escaped) by new one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文