使用 python 和 python NTLM 浏览受 NTLM 保护的网站

发布于 2024-11-16 09:01:42 字数 3132 浏览 2 评论 0原文

我的任务是创建一个脚本,该脚本登录到企业门户,进入特定页面,下载该页面,将其与早期版本进行比较,然后根据所做的更改向特定人员发送电子邮件。最后一部分很简单,但第一步给我带来了最大的麻烦。

在使用 urllib2(我尝试在 python 中执行此操作)连接失败并进行了大约 4 或 5 小时的谷歌搜索后,我确定无法连接的原因是由于网页上的 NTLM 身份验证。我已经尝试了在该网站和其他网站上找到的许多不同的连接过程,但均无济于事。基于 NTLM 示例 我已经完成:

import urllib2
from ntlm import HTTPNtlmAuthHandler

user = 'username'
password = "password"
url = "https://portal.whatever.com/"

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)

# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)

# create a header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'Connection' : 'Keep-alive', 'User-Agent' : user_agent}

response = urllib2.urlopen(urllib2.Request(url, None, header))

当我运行此命令时(使用真实用户名、密码和 url)我得到以下信息:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ntlm2.py", line 21, in <module>
    response = urllib2.urlopen(urllib2.Request(url, None, header))
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 438, in error
     return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
     result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
  urllib2.HTTPError: HTTP Error 401: Unauthorized

对于我来说,此跟踪最有趣的是最后一行显示 401 错误已发回。据我所知,401错误是发送回客户端的第一条消息当 NTLM 启动时。我的印象是 python-ntml 的目的是为我处理 NTLM 进程。这是错误的还是我只是错误地使用它?另外,我并不局限于使用 python 来实现这一点,所以如果有一种更简单的方法可以用另一种语言来做到这一点,请告诉我(从我在谷歌搜索中看到的情况来看,没有)。 谢谢!

I have been tasked with creating a script that logs on to a corporate portal goes to a particular page, downloads the page, compares it to an earlier version and then emails a certain person depending on changes that have been made. The last parts are easy enough but it has been the first step that is giving me the most trouble.

After unsuccessfully using urllib2(I am trying to do this in python) to connect and about 4 or 5 hours of googling I have determined that the reason I can't connect is due to NTLM authentication on the web page. I have tried a bunch of different processes for connecting found on this site and others to no avail. Based on the NTLM example I have done:

import urllib2
from ntlm import HTTPNtlmAuthHandler

user = 'username'
password = "password"
url = "https://portal.whatever.com/"

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)

# create and install the opener
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)

# create a header
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
header = { 'Connection' : 'Keep-alive', 'User-Agent' : user_agent}

response = urllib2.urlopen(urllib2.Request(url, None, header))

When I run this (with a real username, password and url) I get the following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ntlm2.py", line 21, in <module>
    response = urllib2.urlopen(urllib2.Request(url, None, header))
  File "C:\Python27\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python27\lib\urllib2.py", line 400, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 438, in error
     return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
     result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
  urllib2.HTTPError: HTTP Error 401: Unauthorized

The thing that is most interesting about this trace to me is that the final line says a 401 error was sent back. From what I have read the 401 error is the first message sent back to the client when NTLM is started. I was under the impression that the purpose of python-ntml was to handle the NTLM process for me. Is that wrong or am I just using it incorrectly? Also I'm not bounded to using python for this, so if there is an easier way to do this in another language let me know (From what I seen a-googling there isn't).
Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

时光匆匆的小流年 2024-11-23 09:01:42

如果站点使用 NTLM 身份验证,则生成的 HTTPError 的 headers 属性应如下所示:

>>> try:
...   handle = urllib2.urlopen(req)
... except IOError, e:
...   print e.headers
... 
<other headers>
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM

If the site is using NTLM authentication, the headers attribute of the resulting HTTPError should say so:

>>> try:
...   handle = urllib2.urlopen(req)
... except IOError, e:
...   print e.headers
... 
<other headers>
WWW-Authenticate: Negotiate
WWW-Authenticate: NTLM
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文