用于下载私人谷歌文档的 Ruby 脚本

发布于 2024-10-17 12:44:20 字数 504 浏览 9 评论 0原文

我想用 Ruby 编写一个脚本(使用 gdata gem、rest-client gem 或直接 Net::HTTP),使用 gmail-userid/password 与我的 google 文档进行身份验证,然后下载私人文档和文档的列表。

GData 文档指南明确说明了如何公开可见文档,但不清楚如何在脚本中验证自己的身份以访问私人文档。 他们指定的身份验证方法似乎都需要人工干预,无论是使用验证码还是某种形式的OAuth/OpenID 重定向。

有什么方法可以仅使用用户名/密码组合来访问我的私人文档吗?或者也许与 API 密钥一起?如果是这样,有人可以告诉我该怎么做吗?

I would like to write a script in Ruby (using the gdata gem, rest-client gem or just straight Net::HTTP) to authenticate with my google docs using gmail-userid/password, and then download a list of private documents and documents.

The GData documents guide makes it clear how to get publicly visible documents, but it's not clear how I can authenticate myself in my script to get access to private documents. The authentication methods they specify all seem to require human intervention, either with a Capcha or some form of OAuth/OpenID redirection.

Is there some way to access my private documents with just a userid/password combination? Or perhaps that along with an API key? If so, can anybody show me how to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

失而复得 2024-10-24 12:44:20

因此,有时放弃,转向其他事情,然后以全新的心态回来可以创造奇迹。今天早上我又开始看这个,几个小时后它就开始工作了。

我放弃了 OAuth,因为 Ruby OAuth gem 似乎以基于 Web 的应用程序为中心。我开始在 Google Data on Rails 中查找,并使用 ClientLogin,进行身份验证没有问题,据我所知,除非您输入错误的凭据,否则您不会收到验证码请求......或者至少我还没有看到任何其他的。

这是导出电子表格文件的简单代码片段:

require 'gdata/client'  
require 'gdata/http'  
require 'gdata/auth'  
client = GData::Client::Spreadsheets.new  
client.clientlogin('username', 'password')  
test = client.get("http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="resource_ID"&fmcmd&exportFormat=xls")  
file = File.new("spreadsheet.xls", "wb")  
file.write test.body  
file.close  

So, sometimes giving up, moving on to something else, and coming back with a fresh mindset can do wonders. I started looking at this again this morning and within a couple of hours got it working.

I ditched OAuth because the Ruby OAuth gem seems to be centered around web based applications. I started poking around in Google Data on Rails, and using ClientLogin, it was no problem getting authenticated, and as far as I can tell, you don't get CAPTCHA requests unless you enter in the wrong credentials... or at least I haven't otherwise seen any yet.

Here is a simple code snippet to export a spreadsheet file:

require 'gdata/client'  
require 'gdata/http'  
require 'gdata/auth'  
client = GData::Client::Spreadsheets.new  
client.clientlogin('username', 'password')  
test = client.get("http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="resource_ID"&fmcmd&exportFormat=xls")  
file = File.new("spreadsheet.xls", "wb")  
file.write test.body  
file.close  
终止放荡 2024-10-24 12:44:20

我今天开始了这个完全相同的项目,并遇到了同样的问题。我已经设法绕过使用 OAuth 或 OpenID,但仍在努力实际下载文件......这似乎应该是简单的部分。不管怎样,这就是我所做的:

我使用 Mechanize gem 来抓取 docs.google.com 页面上的用户名和密码表单。我通过 Mechanize 提交了我的凭据,现在可以访问我的 Google 文档。

此时,我发现我可以使用此 Google 文档中提到的下载 URL:

http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#DownloadingDocs

该网址如下所示(我正在使用电子表格):

“http://spreadsheets. google.com/feeds/download/spreadsheets/Export?key="resource_id_goes_here"&exportFormat=xls"

对于修补/测试,我只是从网络浏览器的地址栏中获取电子表格的资源 ID(当我在我的浏览器中打开电子表格)并将其插入浏览器另一个选项卡中的上述 URL。这似乎有效,因为当我提交 URL 时,电子表格会作为 .xls 文件下载。请注意,这一切都是使用我的网络浏览器。

我无法通过 Ruby 脚本成功启动下载。该 URL 不是文件的直接链接,因此我不太确定如何正确捕获文件数据。该脚本成功运行,但如果我将 Ruby“get”方法(使用该 URL 作为参数)的输出存储在一个对象中,它似乎是一些 javascript 重定向内容。我可能忽略了一些显而易见的事情,但这就是我所处的位置。我把责任归咎于花在阅读 OAuth 和 OpenID 上的时间……这并不有趣。

希望其中一些有用。这是我在身份验证研究中遇到的另一个有趣的 Ruby gem:

OAuth Ruby Gem:
http://oauth.rubyforge.org/

I started on this exact same project today and have run into the same issue. I've managed to get around using OAuth or OpenID, but still working on actually getting a file downloaded... which seems like it should be the easy part. Anyway, here's what I've done:

I'm using the Mechanize gem to scrape the the docs.google.com page for the username and password forms. I submit my credentials via Mechanize and now have access to my Google docs.

At this point I find I can use the download URL mentioned in this Google documentation:

http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#DownloadingDocs

The URL looks like this (I'm working with spreadsheets):

"http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="resource_id_goes_here"&exportFormat=xls"

For tinkering/testing, I'm just taking the resource id of my spreadsheet from the address bar of my web browser (when I have the spreadsheet open in my browser) and plugging it into the above URL in another tab of my browser. This seems to work because when I submit the URL the spreadsheet is downloaded as an .xls file. Note this is all using my web browser.

I haven't been able to successfully initiate a download via my Ruby script. That URL isn't a a direct link to the file so I'm not quite sure how to properly capture the file data. The script runs successfully, but if I store the output of the Ruby 'get' method (which is using that URL as an argument) in an object, it appears to be some javascript redirection stuff. I'm probably overlooking something obvious, but that's where I'm at. I blame being stuck on the hours I spent reading about OAuth and OpenID... that wasn't much fun.

Hopefully some of that is useful. Here's another interesting Ruby gem I came across in my research on the authentication stuff:

OAuth Ruby Gem:
http://oauth.rubyforge.org/

早茶月光 2024-10-24 12:44:20

当然,这是我正在做的事情的基本版本:

require 'mechanize'  
agent = Mechanize.new  
page = agent.get "https://docs.google.com"  
form = page.forms.first  
form.Email = "your_username"  
form.Passwd = "your_password"   
page = agent.submit form      
test = agent.get "google_download_url_goes_here"  
puts test.body

如果您查看 test,您将看到 Java 重定向内容而不是 xls 文件。

我已经几天没有处理它了,但我有一种轻微的感觉,因为脚本没有“正确”经过身份验证,所以我得到了重定向。 Mechanize 应该处理 cookie 和重定向,所以我认为这应该可以正常工作,但事实并非如此。

更新

导出 URL 位于您在评论中链接到的文档的同一页面的下方。用于导出电子表格的网址如下所示:

http://spreadsheets.google。 com/feeds/download/spreadsheets/Export?key="document_resource_id_goes_here"&exportFormat=xls

您应该能够将其插入浏览器并下载文件(当然,如果您已登录)。文档资源 ID 只是您正在使用的任何文档的唯一键,您可以手动将其粘贴到 URL 中以在浏览器中进行测试。

但是,我非常确定这些 API URL 都无法在脚本中工作,除非它按照 Google 要求的方式正确处理身份验证。我不太确定我在看什么,但是使用 Wireshark 嗅探数据包时,我可以在使用脚本时看到一些错误,而在使用浏览器时却看不到这些错误。当服务器和脚本交换某种证书信息时,似乎会发生这些错误。不管怎样,我一直在研究 OAuth gem,并认为我开始更好地理解它。

如果您访问此处:

http://googlecodesamples.com/oauth_playground/

您可以尝试使用 OAuth 内容,它的工作原理有点疯狂。您请求一个带有一堆必须“恰到好处”的参数的请求令牌。它发送请求令牌,然后您可以使用该令牌来引用您输入 Google 凭据的登录页面(就像您手动使用 Google 文档时一样)。验证您的凭据后,它会要求您授予对请求令牌的权限。请求令牌升级为访问令牌,然后传递回您的脚本,然后您可以通过引用此访问令牌开始使用 API 的其余部分...似乎有些过分,但我不是安全专家。

以下是我希望做的事情:

  • 了解如何使用 OAuth Ruby gem 向 Google 请求并向 Google 发送令牌。

  • 使用 Mechanize 抓取 Google 登录页面,并在我可以向其发送所需的请求令牌后输入凭据

  • < p>提交凭据后,使用 Mechanize 单击“授予访问权限”按钮

    提交

  • 然后希望发现我实际上可以使用 部分

来处理文件(哎呀!学习如何正确格式化此网站上的文本也同样困难!!:))

Sure, here's a basic version of what I'm doing:

require 'mechanize'  
agent = Mechanize.new  
page = agent.get "https://docs.google.com"  
form = page.forms.first  
form.Email = "your_username"  
form.Passwd = "your_password"   
page = agent.submit form      
test = agent.get "google_download_url_goes_here"  
puts test.body

If you look at test you'll see the Java redirection stuff instead of the xls file.

I haven't worked on it in a couple of days, but I have a slight feeling I'm getting the redirection because the script isn't "properly" authenticated. Mechanize is supposed to handle cookies and redirects so I would think this should simply just work, but its not.

UPDATE:

The export URL's are a little farther down on the same page in that documentation you linked to in your comment. The URL for exporting a spreadsheet looks like this:

http://spreadsheets.google.com/feeds/download/spreadsheets/Export?key="document_resource_id_goes_here"&exportFormat=xls

You should be able to plug that into a browser and download a file (if you are logged in, of course). The document resource id is just the unique key for whatever document you are working with, you can manually paste it into the URL for testing in a browser.

However, I'm pretty sure none of these API URL's will work in a script unless it is properly handling authentication the way Google is asking for. I'm not exactly sure what I'm looking at, but using Wireshark to sniff packets I can see some errors when using a script that I don't get when using my browser. These errors seem to occur when the server and script are exchanging some kind of certificate info. Anyway, I've been looking at the OAuth gem some more and think I am starting to understand it better.

If you go here:

http://googlecodesamples.com/oauth_playground/

You can play around with the OAuth stuff, it's kind of crazy how it works. You ask for a request token with a bunch of parameters that must be 'just' right. It sends the request token which you then use to reference a login page where you enter your Google credentials (as you would when you manually work with Google docs). Once your credentials are verified it asks you to grant permission to the request token. The request token is upgraded to an access token and then passed back to your script and you can then start working with the rest of the API by referencing this access token... seems like overkill, but I'm no security expert.

Here's what I'm hoping to do:

  • Figure out how to use the OAuth Ruby gem to request and send tokens to Google.

  • Use Mechanize to scrape the Google login page and enter credentials once I can send it the request token it wants

  • Use Mechanize to click on the "Grant Access" button once my credentials are submitted

  • Then hopefully find that I can actually use the rest of the API to work with files

(Grrr! learning how to properly format text on this website is about as difficult!! :))

柠栀 2024-10-24 12:44:20

第一个答案中的代码不太适合我。这是我用过的。

require 'gdata/client'
require 'gdata/http'
require 'gdata/auth'

KEY = 'YOUR_DOCUMENT_KEY'
URL = "https://docs.google.com/feeds/download/spreadsheets"

client = GData::Client::Spreadsheets.new
client.clientlogin('REPLACE_WITH_LOGIN', 'REPLACE_WITH_PASSWORD')

#Change the csw at the end to match your required format
test = client.get("#{URL}/Export?key=#{KEY}&fmcmd&exportFormat=csv")

puts test.body

The code in the first answer didn't quite work for me. Here's what I used.

require 'gdata/client'
require 'gdata/http'
require 'gdata/auth'

KEY = 'YOUR_DOCUMENT_KEY'
URL = "https://docs.google.com/feeds/download/spreadsheets"

client = GData::Client::Spreadsheets.new
client.clientlogin('REPLACE_WITH_LOGIN', 'REPLACE_WITH_PASSWORD')

#Change the csw at the end to match your required format
test = client.get("#{URL}/Export?key=#{KEY}&fmcmd&exportFormat=csv")

puts test.body
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文