如何使用 Python 以编程方式从客户端 OAuth 流程检索 access_token？

发布于 2024-12-25 11:05:02 字数 5099 浏览 2 评论 0原文

这个问题发布在 StackApps 上，但问题可能更多是编程问题。

我正在为 StackOverflow 开发一个桌面收件箱通知程序，使用 API 和 Python

我正在处理的脚本首先让用户登录 StackExchange，然后请求应用程序的授权。假设应用程序已通过用户的 Web 浏览器交互获得授权，应用程序应该能够通过身份验证向 API 发出请求，因此它需要特定于用户的访问令牌。这是通过以下 URL 完成的：https://stackexchange.com/oauth/dialog?client_id=54&scope=read_inbox&redirect_uri=https://stackexchange.com/oauth/login_success。

通过网络浏览器请求授权时，会发生重定向，并在 # 之后返回访问代码。 但是，当使用 Python (urllib2) 请求相同的 URL 时，响应中不会返回哈希或密钥。

为什么我的 urllib2 请求的处理方式与 Firefox 或 W3m 中发出的相同请求不同？我应该怎么做才能以编程方式模拟此请求并检索 access_token？

这是我的脚本（它是实验性的），请记住：它假设用户已经授权该应用程序。

#!/usr/bin/env python

import urllib
import urllib2
import cookielib
from BeautifulSoup import BeautifulSoup
from getpass import getpass    

# Define URLs
parameters = [ 'client_id=54',
               'scope=read_inbox',
               'redirect_uri=https://stackexchange.com/oauth/login_success'
             ]

oauth_url = 'https://stackexchange.com/oauth/dialog?' + '&'.join(parameters)
login_url = 'https://openid.stackexchange.com/account/login'
submit_url = 'https://openid.stackexchange.com/account/login/submit'
authentication_url = 'http://stackexchange.com/users/authenticate?openid_identifier='

# Set counter for requests:
counter = 0

# Build opener
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))

def authenticate(username='', password=''):

    '''
        Authenticates to StackExchange using user-provided username and password
    '''

    # Build up headers
    user_agent = 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0'
    headers = {'User-Agent' : user_agent}

    # Set Data to None
    data = None

    # 1. Build up URL request with headers and data    
    request = urllib2.Request(login_url, data, headers)
    response = opener.open(request)

    # Build up POST data for authentication
    html = response.read()
    fkey = BeautifulSoup(html).findAll(attrs={'name' : 'fkey'})[0].get('value').encode()

    values = {'email' : username,
              'password' : password,
              'fkey' : fkey}

    data = urllib.urlencode(values)

    # 2. Build up URL for authentication
    request = urllib2.Request(submit_url, data, headers)
    response = opener.open(request)

    # Check if logged in
    if response.url == 'https://openid.stackexchange.com/user':
        print ' Logged in! :) '
    else:
        print ' Login failed! :( '

    # Find user ID URL    
    html = response.read()
    id_url = BeautifulSoup(html).findAll('code')[0].text.split('"')[-2].encode()

    # 3. Build up URL for OpenID authentication
    data = None
    url = authentication_url + urllib.quote_plus(id_url)
    request = urllib2.Request(url, data, headers)
    response = opener.open(request)

    # 4. Build up URL request with headers and data
    request = urllib2.Request(oauth_url, data, headers)
    response = opener.open(request)

    if '#' in response.url:
        print 'Access code provided in URL.'
    else:
        print 'No access code provided in URL.'

if __name__ == '__main__':
    username = raw_input('Enter your username: ')
    password = getpass('Enter your password: ')
    authenticate(username, password)

回应以下评论：

Firefox 中的篡改数据使用以下标头请求上述 URL（如代码中的 oauth_url）：

Host=stackexchange.com
User-Agent=Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-us,en;q=0.5
Accept-Encoding=gzip, deflate
Accept-Charset=ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection=keep-alive
Cookie=m=2; __qca=P0-556807911-1326066608353; __utma=27693923.1085914018.1326066609.1326066609.1326066609.1; __utmb=27693923.3.10.1326066609; __utmc=27693923; __utmz=27693923.1326066609.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); gauthed=1; ASP.NET_SessionId=nt25smfr2x1nwhr1ecmd4ok0; se-usr=t=z0FHKC6Am06B&s=pblSq0x3B0lC

在 urllib2 请求中，标头仅提供用户代理值。 Cookie 未显式传递，但在请求时，se-usr 在 Cookie jar 中可用。

响应标头将首先是重定向：

Status=Found - 302
Server=nginx/0.7.65
Date=Sun, 08 Jan 2012 23:51:12 GMT
Content-Type=text/html; charset=utf-8
Connection=keep-alive
Cache-Control=private
Location=https://stackexchange.com/oauth/login_success#access_token=OYn42gZ6r3WoEX677A3BoA))&expires=86400
Set-Cookie=se-usr=t=kkdavslJe0iq&s=pblSq0x3B0lC; expires=Sun, 08-Jul-2012 23:51:12 GMT; path=/; HttpOnly
Content-Length=218

然后将通过另一个请求使用该标头中的新 se-usr 值进行重定向。

我不知道如何捕获 urllib2 中的 302，它自己处理它（这很棒）。不过，最好看看位置标头中提供的访问令牌是否可用。

最后一个响应头没有什么特别的，Firefox 和 Urllib 都会返回类似这样的内容：

Server: nginx/0.7.65
Date: Sun, 08 Jan 2012 23:48:16 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Content-Length: 5664

我希望我没有提供机密信息，如果我这样做了，请告诉我:D

原文

This question was posted on StackApps, but the issue may be more a programming issue than an authentication issue, hence it may deserve a better place here.

I am working on an desktop inbox notifier for StackOverflow, using the API with Python.

The script I am working on first logs the user in on StackExchange, and then requests authorisation for the application. Assuming the application has been authorised through web-browser interaction of the user, the application should be able to make requests to the API with authentication, hence it needs the access token specific to the user. This is done with the URL: https://stackexchange.com/oauth/dialog?client_id=54&scope=read_inbox&redirect_uri=https://stackexchange.com/oauth/login_success.

When requesting authorisation via the web-browser the redirect is taking place and an access code is returned after a #. However, when requesting this same URL with Python (urllib2), no hash or key is returned in the response.

Why is it my urllib2 request is handled differently from the same request made in Firefox or W3m? What should I do to programmatically simulate this request and retrieve the access_token?

Here is my script (it's experimental) and remember: it assumes the user has already authorised the application.

#!/usr/bin/env python

import urllib
import urllib2
import cookielib
from BeautifulSoup import BeautifulSoup
from getpass import getpass    

# Define URLs
parameters = [ 'client_id=54',
               'scope=read_inbox',
               'redirect_uri=https://stackexchange.com/oauth/login_success'
             ]

oauth_url = 'https://stackexchange.com/oauth/dialog?' + '&'.join(parameters)
login_url = 'https://openid.stackexchange.com/account/login'
submit_url = 'https://openid.stackexchange.com/account/login/submit'
authentication_url = 'http://stackexchange.com/users/authenticate?openid_identifier='

# Set counter for requests:
counter = 0

# Build opener
jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))

def authenticate(username='', password=''):

    '''
        Authenticates to StackExchange using user-provided username and password
    '''

    # Build up headers
    user_agent = 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0'
    headers = {'User-Agent' : user_agent}

    # Set Data to None
    data = None

    # 1. Build up URL request with headers and data    
    request = urllib2.Request(login_url, data, headers)
    response = opener.open(request)

    # Build up POST data for authentication
    html = response.read()
    fkey = BeautifulSoup(html).findAll(attrs={'name' : 'fkey'})[0].get('value').encode()

    values = {'email' : username,
              'password' : password,
              'fkey' : fkey}

    data = urllib.urlencode(values)

    # 2. Build up URL for authentication
    request = urllib2.Request(submit_url, data, headers)
    response = opener.open(request)

    # Check if logged in
    if response.url == 'https://openid.stackexchange.com/user':
        print ' Logged in! :) '
    else:
        print ' Login failed! :( '

    # Find user ID URL    
    html = response.read()
    id_url = BeautifulSoup(html).findAll('code')[0].text.split('"')[-2].encode()

    # 3. Build up URL for OpenID authentication
    data = None
    url = authentication_url + urllib.quote_plus(id_url)
    request = urllib2.Request(url, data, headers)
    response = opener.open(request)

    # 4. Build up URL request with headers and data
    request = urllib2.Request(oauth_url, data, headers)
    response = opener.open(request)

    if '#' in response.url:
        print 'Access code provided in URL.'
    else:
        print 'No access code provided in URL.'

if __name__ == '__main__':
    username = raw_input('Enter your username: ')
    password = getpass('Enter your password: ')
    authenticate(username, password)

To respond to comments below:

Tamper data in Firefox requests the above URL (as oauth_url in the code) with the following headers:

Host=stackexchange.com
User-Agent=Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-us,en;q=0.5
Accept-Encoding=gzip, deflate
Accept-Charset=ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection=keep-alive
Cookie=m=2; __qca=P0-556807911-1326066608353; __utma=27693923.1085914018.1326066609.1326066609.1326066609.1; __utmb=27693923.3.10.1326066609; __utmc=27693923; __utmz=27693923.1326066609.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); gauthed=1; ASP.NET_SessionId=nt25smfr2x1nwhr1ecmd4ok0; se-usr=t=z0FHKC6Am06B&s=pblSq0x3B0lC

In the urllib2 request the header provides the user-agent value only. The cookie is not passed explicitly, but the se-usr is available in the cookie jar at the time of the request.

The response headers will be first the redirect:

Status=Found - 302
Server=nginx/0.7.65
Date=Sun, 08 Jan 2012 23:51:12 GMT
Content-Type=text/html; charset=utf-8
Connection=keep-alive
Cache-Control=private
Location=https://stackexchange.com/oauth/login_success#access_token=OYn42gZ6r3WoEX677A3BoA))&expires=86400
Set-Cookie=se-usr=t=kkdavslJe0iq&s=pblSq0x3B0lC; expires=Sun, 08-Jul-2012 23:51:12 GMT; path=/; HttpOnly
Content-Length=218

Then the redirect will take place through another request with the fresh se-usr value from that header.

I don't know how to catch the 302 in urllib2, it handles it by itself (which is great). It would be nice however to see if the access token as provided in the location header would be available.

There's nothing special in the last response header, both Firefox and Urllib return something like:

Server: nginx/0.7.65
Date: Sun, 08 Jan 2012 23:48:16 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Content-Length: 5664

I hope I didn't provide confidential info, let me know if I did :D

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

-小熊_ 2025-01-01 11:05:02

由于 urllib2 处理重定向的方式，该令牌不会出现。具体细节我不太了解，所以这里就不详细说了。

解决方案是在 urllib2 处理重定向之前捕获 302。这可以通过对 urllib2.HTTPRedirectHandler 进行子类化以获取带有主题标签和令牌的重定向来完成。下面是一个子类化处理程序的简短示例：

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Going through 302:\n"
        print headers
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

在标头中，location 属性将提供完整长度的重定向 URL，即包括主题标签和令牌：

输出提取：

...
Going through 302:

Server: nginx/0.7.65
Date: Mon, 09 Jan 2012 20:20:11 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Location: https://stackexchange.com/oauth/login_success#access_token=K4zKd*HkKw5Opx(a8t12FA))&expires=86400
Content-Length: 218
...

有关使用 urllib2 在 < 上捕获重定向的更多信息a href="https://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect">StackOverflow （课程）。

The token does not appear because of the way urllib2 handles the redirect. I am not familiar with the details so I won't elaborate here.

The solution is to catch the 302 before the urllib2 handles the redirect. This can be done by sub-classing the urllib2.HTTPRedirectHandler to get the redirect with its hashtag and token. Here is a short example of subclassing the handler:

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Going through 302:\n"
        print headers
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

In the headers the location attribute will provide the redirect URL in full length, i.e. including the hashtag and token:

Output extract:

...
Going through 302:

Server: nginx/0.7.65
Date: Mon, 09 Jan 2012 20:20:11 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Cache-Control: private
Location: https://stackexchange.com/oauth/login_success#access_token=K4zKd*HkKw5Opx(a8t12FA))&expires=86400
Content-Length: 218
...

More on catching redirects with urllib2 on StackOverflow (of course).

回复收藏 0 原文

~没有更多了~