如何验证 urllib2 脚本以便从 Django 站点访问 HTTPS Web 服务？

发布于 2024-10-19 02:27:45 字数 2696 浏览 0 评论 0原文

大家。我正在开发一个 django/mod_wsgi/apache2 网站，该网站使用 https 为所有请求和响应提供敏感信息。如果用户未经过身份验证，所有视图都会被写入重定向。它还具有多个旨在像 RESTful Web 服务一样运行的视图。

我现在正在编写一个脚本，该脚本使用 urllib/urllib2 来联系其中几个服务，以便下载一系列非常大的文件。我在尝试登录时遇到 403: FORBIDDEN 错误问题。

我用于身份验证和登录的（草稿）方法是：

def login( base_address, username=None, password=None ):

    # prompt for the username (if needed), password
    if username == None:
        username = raw_input( 'Username: ' )
    if password == None:
        password = getpass.getpass( 'Password: ' )
    log.info( 'Logging in %s' % username )

    # fetch the login page in order to get the csrf token
    cookieHandler = urllib2.HTTPCookieProcessor()
    opener = urllib2.build_opener( urllib2.HTTPSHandler(), cookieHandler )
    urllib2.install_opener( opener )

    login_url = base_address + PATH_TO_LOGIN
    log.debug( "login_url: " + login_url )
    login_page = opener.open( login_url )

    # attempt to get the csrf token from the cookie jar
    csrf_cookie = None
    for cookie in cookieHandler.cookiejar:
        if cookie.name == 'csrftoken':
             csrf_cookie = cookie
             break
    if not cookie:
        raise IOError( "No csrf cookie found" )
    log.debug(  "found csrf cookie: " + str( csrf_cookie ) )
    log.debug(  "csrf_token = %s" % csrf_cookie.value )

    # login using the usr, pwd, and csrf token
    login_data = urllib.urlencode( dict(
        username=username, password=password,
        csrfmiddlewaretoken=csrf_cookie.value ) )
    log.debug( "login_data: %s" % login_data )

    req = urllib2.Request( login_url, login_data )
    response = urllib2.urlopen( req )
    # <--- 403: FORBIDDEN here

    log.debug( 'response url:\n' + str( response.geturl() ) + '\n' )
    log.debug( 'response info:\n' + str( response.info() ) + '\n' )

    # should redirect to the welcome page here, if back at log in - refused
    if response.geturl() == login_url:
        raise IOError( 'Authentication refused' )

    log.info( '\t%s is logged in' % username )
    # save the cookies/opener for further actions
    return opener

我使用 HTTPCookieHandler 在脚本上存储 Django 的身份验证 cookie -侧，以便我可以访问网络服务并完成我的重定向。

我知道如果我不传递 csrf 令牌和登录信息，Django 的 CSRF 中间件将会把我排除在外，所以我首先从第一个页面/表单加载的 cookiejar 中提取它。正如我提到的，这适用于该网站的 http/development 版本。

具体来说，当我尝试通过 https 连接将凭据发布到登录页面/表单时，收到 403 错误。此方法在使用 http 连接的开发服务器上使用时有效。

没有 Apache 目录指令阻止访问该区域（我可以看到）。该脚本在没有发布数据的情况下成功连接到登录页面，因此我认为这将使 Apache 摆脱问题（但我可能是错的）。

我使用的 python 安装都是使用 SSL 编译的。

我还读到 urllib2 不允许通过代理进行 https 连接。我对代理不是很有经验，所以我不知道使用远程计算机上的脚本是否实际上是代理连接以及这是否会成为问题。这是导致访问问题的原因吗？

据我所知，问题出在 cookie 和发布数据的组合上，但我不清楚从哪里获取它。

任何帮助将不胜感激。谢谢

原文

everybody.
I'm working on a django/mod_wsgi/apache2 website that serves sensitive information using https for all requests and responses. All views are written to redirect if the user isn't authenticated. It also has several views that are meant to function like RESTful web services.

I'm now in the process of writing a script that uses urllib/urllib2 to contact several of these services in order to download a series of very large files. I'm running into problems with 403: FORBIDDEN errors when attempting to log in.

The (rough-draft) method I'm using for authentication and log in is:

def login( base_address, username=None, password=None ):

    # prompt for the username (if needed), password
    if username == None:
        username = raw_input( 'Username: ' )
    if password == None:
        password = getpass.getpass( 'Password: ' )
    log.info( 'Logging in %s' % username )

    # fetch the login page in order to get the csrf token
    cookieHandler = urllib2.HTTPCookieProcessor()
    opener = urllib2.build_opener( urllib2.HTTPSHandler(), cookieHandler )
    urllib2.install_opener( opener )

    login_url = base_address + PATH_TO_LOGIN
    log.debug( "login_url: " + login_url )
    login_page = opener.open( login_url )

    # attempt to get the csrf token from the cookie jar
    csrf_cookie = None
    for cookie in cookieHandler.cookiejar:
        if cookie.name == 'csrftoken':
             csrf_cookie = cookie
             break
    if not cookie:
        raise IOError( "No csrf cookie found" )
    log.debug(  "found csrf cookie: " + str( csrf_cookie ) )
    log.debug(  "csrf_token = %s" % csrf_cookie.value )

    # login using the usr, pwd, and csrf token
    login_data = urllib.urlencode( dict(
        username=username, password=password,
        csrfmiddlewaretoken=csrf_cookie.value ) )
    log.debug( "login_data: %s" % login_data )

    req = urllib2.Request( login_url, login_data )
    response = urllib2.urlopen( req )
    # <--- 403: FORBIDDEN here

    log.debug( 'response url:\n' + str( response.geturl() ) + '\n' )
    log.debug( 'response info:\n' + str( response.info() ) + '\n' )

    # should redirect to the welcome page here, if back at log in - refused
    if response.geturl() == login_url:
        raise IOError( 'Authentication refused' )

    log.info( '\t%s is logged in' % username )
    # save the cookies/opener for further actions
    return opener

I'm using the HTTPCookieHandler to store Django's authentication cookies on the script-side so I can access the web services and get through my redirects.

I know that the CSRFmiddleware for Django is going to bump me out if I don't pass the csrf token along with the log in information, so I pull that first from the first page/form load's cookiejar. Like I mentioned, this works with the http/development version of the site.

Specifically, I'm getting a 403 when trying to post the credentials to the login page/form over the https connection. This method works when used on the development server which uses an http connection.

There is no Apache directory directive that prevents access to that area (that I can see). The script connects successfully to the login page without post data so I'm thinking that would leave Apache out of the problem (but I could be wrong).

The python installations I'm using are both compiled with SSL.

I've also read that urllib2 doesn't allow https connections via proxy. I'm not very experienced with proxies, so I don't know if using a script from a remote machine is actually a proxy connection and whether that would be the problem. Is this causing the access problem?

From what I can tell, the problem is in the combination of cookies and the post data, but I'm unclear as to where to take it from here.

Any help would be appreciated. Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

丑疤怪 2024-10-26 02:27:45

请原谅我回答我自己的问题，但是 - 根据记录，这似乎已经解决了：

事实证明，我需要将 HTTP Referer 标头设置为我发布登录信息的请求中的登录页面 url。

req.add_header( 'Referer', login_url )

Django CSRF 文档 - 具体来说，第 4 步。

由于我们在生产端使用 HTTPS 且 DEBUG=False 的服务器设置有些特殊，我没有看到 csrf_failure 失败的原因（在本例中：“引用检查失败 - 无引用” ）通常在调试信息中输出。我最终将该失败原因打印到 Apache error_log 并在其上进行 STFW 处理。这引导我 code.djangoproject/... /csrf.py 和 Referer 标头修复。

Please excuse my answering my own question, but - for the record this seems to have solved it:

It turns out I needed to set the HTTP Referer header to the login page url in the request where I post the login information.

req.add_header( 'Referer', login_url )

The reason is explained on the Django CSRF documentation - specifically, step 4.

Due to our somewhat peculiar server setup where we use HTTPS on the production side and DEBUG=False, I wasn't seeing the csrf_failure reason for failure (in this case: 'Referer checking failed - no referer') that is normally output in the DEBUG info. I ended up printing that failure reason to the Apache error_log and STFW'd on it. That lead me to code.djangoproject/.../csrf.py and the Referer header fix.

回复收藏 0 原文

戏舞 2024-10-26 02:27:45

这适用于我在 https 上的 django 设置，受到你的启发。我开始认为问题出在这段代码之外......服务器有说什么吗？我很可能正在研究 apache。

我使用 nginx 上的 ssl 将以下代码从本地计算机连接到服务器，因此 apache 可能是值得一看的地方。我想缩小范围的一种方法是在我的登录页面上尝试您的脚本:) 给我发一封电子邮件！

import urllib
import urllib2
import contextlib


def login(login_url, username, password):
    """
    Login to site
    """
    cookies = urllib2.HTTPCookieProcessor()
    opener = urllib2.build_opener(cookies)
    urllib2.install_opener(opener)

    opener.open(login_url)

    try:
        token = [x.value for x in cookies.cookiejar if x.name == 'csrftoken'][0]
    except IndexError:
        return False, "no csrftoken"

    params = dict(username=username, password=password, \
        this_is_the_login_form=True,
        csrfmiddlewaretoken=token,
         )
    encoded_params = urllib.urlencode(params)

    with contextlib.closing(opener.open(login_url, encoded_params)) as f:
        html = f.read()

        print html
        # we're in.

This works on my django setup on https which is inspired by yours. I'm starting to think that the problem is outside this code... Is the server saying anything? I might very well be looking into apache.

I'm using the following code from my local machine to my server using ssl on nginx, so apache might be the place to look. I suppose one way to narrow it down is to try your script on my login page :) Shoot me an email!

import urllib
import urllib2
import contextlib


def login(login_url, username, password):
    """
    Login to site
    """
    cookies = urllib2.HTTPCookieProcessor()
    opener = urllib2.build_opener(cookies)
    urllib2.install_opener(opener)

    opener.open(login_url)

    try:
        token = [x.value for x in cookies.cookiejar if x.name == 'csrftoken'][0]
    except IndexError:
        return False, "no csrftoken"

    params = dict(username=username, password=password, \
        this_is_the_login_form=True,
        csrfmiddlewaretoken=token,
         )
    encoded_params = urllib.urlencode(params)

    with contextlib.closing(opener.open(login_url, encoded_params)) as f:
        html = f.read()

        print html
        # we're in.

回复收藏 0 原文

~没有更多了~