HTTP 基本身份验证似乎不适用于 python 中的 urllib2
我正在尝试使用 urllib2 下载受基本身份验证保护的页面。我使用的是 python 2.7,但我也在另一台使用 python 2.5 的计算机上尝试过,并遇到了完全相同的行为。我尽可能严格地遵循本指南中给出的示例,这是我的代码制作:
import urllib2
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://authenticationsite.com/', "protected", "password")
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
f = opener.open("http://authenticationsite.com/content.html")
print f.read()
f.close()
不幸的是,服务器不是我的,所以我无法分享详细信息;我把上面和下面的它们交换了。当我运行它时,我得到以下回溯:
File
"/usr/lib/python2.7/urllib2.py", line
397, in open
response = meth(req, response) File "/usr/lib/python2.7/urllib2.py",
line 510, in http_response
'http', request, response, code, msg, hdrs) File
"/usr/lib/python2.7/urllib2.py", line
435, in error
return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py",
line 369, in _call_chain
result = func(*args) File "/usr/lib/python2.7/urllib2.py", line
518, in http_error_default
raise HTTPError(req.get_full_url(), code,
msg, hdrs, fp) urllib2.HTTPError: HTTP
Error 401: Authorization Required
现在,有趣的部分是当我使用 ngrep 监视计算机上的 tcp 流量时:
ngrep host 74.125.224.49 interface:
wlan0 (192.168.1.0/255.255.255.0)
filter: (ip) and ( host 74.125.224.49
)
#### T 192.168.1.74:34366 -74.125.224.49:80 [AP] GET /content.html
HTTP/1.1..Accept-Encoding:
identity..Host:
authenticationsite.com..Connection:
close..User-Agent:
Python-urllib/2.7....
## T 74.125.224.49:80 -192.168.1.74:34366 [AP] HTTP/1.1 401 Authorization Required..Date: Sun, 27
Feb 2011 03:39:31 GMT..Server:
Apache/2.2.3 (Red
Hat)..WWW-Authenticate: Digest
realm="protected",
nonce="6NSgTzudBAA=ac585d1f7ae0632c4b90324aff5e39e0f1fc25
05", algorithm=MD5,
qop="auth"..Content-Length:
486..Connection: close..Content-Type: text/html;
charset=iso-8859-1....<!DOCTYPE HTML
PUBLIC "-//IETF//DTD HTML
2.0//EN">.<html><head>.<title>401 Authorization
Required</title>.</head><body>.<h1>Authorization
Required</h1>.<p>This server could not
verify that you.are authorized to
access the document.requested. Either
you supplied the wrong.credentials
(e.g., badpassword), or
your.browser doesn't understand how to
supply.the credentials
required.</p>.<hr>.<address>Apache/2.2.3
(Red Hat) Server at
authenticationsite.com Port
80</address>.</body></html>.
####
看起来 urllib2 正在抛出该异常,甚至在收到初始 401 错误后甚至没有尝试提供凭据。
为了进行比较,下面是我在 Web 浏览器中进行身份验证时 ngrep 的输出:
ngrep host 74.125.224.49 interface:
wlan0 (192.168.1.0/255.255.255.0)
filter: (ip) and ( host 74.125.224.49
)
#### T 192.168.1.74:36102 -74.125.224.49:80 [AP] GET /content.html HTTP/1.1..Host:
authenticationsite.com..User-Agent:
Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.9.2.12) Gecko/20101027
Firefox/3.6.12..Accept: text
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language:
en-us,en;q=0.5..Accept-Encoding:
gzip,deflate..Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive:
115..Connection: keep- alive....
## T 74.125.224.49:80 -192.168.1.74:36102 [AP] HTTP/1.1 401 Authorization Required..Date: Sun, 27
Feb 2011 03:43:42 GMT..Server:
Apache/2.2.3 (Red
Hat)..WWW-Authenticate: Digest
realm="protected",
nonce="rKCfXjudBAA=0c1111321169e30f689520321dbcce37a1876b
be", algorithm=MD5,
qop="auth"..Content-Length:
486..Connection: close..Content-Type: text/html;
charset=iso-8859-1....<!DOCTYPE HTML
PUBLIC "-//IETF//DTD HTML
2.0//EN">.<html><head>.<title>401 Authorization
Required</title>.</head><body>.<h1>Authorization
Required</h1>.<p>This server could not
verify that you.are authorized to
access the document.requested. Either
you supplied the wrong.credentials
(e.g., badpassword), or
your.browser doesn't understand how to
supply.the credentials
required.</p>.<hr>.<address>Apache/2.2.3
(Red Hat) Server at
authenticationsite.com Port
80</address>.</body></html>.
######### T 192.168.1.74:36103 -74.125.224.49:80 [AP] GET /content.html HTTP/1.1..Host:
authenticationsite.com..User-Agent:
Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.9.2.12) Gecko/20101027
Firefox/3.6.12..Accept: text
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language:
en-us,en;q=0.5..Accept-Encoding:
gzip,deflate..Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive:
115..Connection: keep- alive..Authorization: Digest
username="protected",
realm="protected",
nonce="rKCfXjudBAA=0c1111199162342689520550dbcce37a1876bbe",
uri="/content.html", algorithm= MD5,
response="3b65dadaa00e1d6a1892ffff49f9f325",
qop=auth, nc=00000001,
cnonce="7636125b7fde3d1b"....
##
然后是网站的内容。
我已经玩了一段时间了,无法弄清楚我做错了什么。如果有人能帮助我,我将非常感激!
I'm trying to download a page protected with basic authentication using urllib2. I'm using python 2.7 but I also tried it on another computer with python 2.5 and encountered the exact same behavior. I followed the example given in this guide as closely as I could and here is the code that I produced:
import urllib2
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://authenticationsite.com/', "protected", "password")
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
f = opener.open("http://authenticationsite.com/content.html")
print f.read()
f.close()
Unfortunately the server isn't mine so I can't share the details; I swapped them out above and below. When I run it I get the following Traceback:
File
"/usr/lib/python2.7/urllib2.py", line
397, in open
response = meth(req, response) File "/usr/lib/python2.7/urllib2.py",
line 510, in http_response
'http', request, response, code, msg, hdrs) File
"/usr/lib/python2.7/urllib2.py", line
435, in error
return self._call_chain(*args) File "/usr/lib/python2.7/urllib2.py",
line 369, in _call_chain
result = func(*args) File "/usr/lib/python2.7/urllib2.py", line
518, in http_error_default
raise HTTPError(req.get_full_url(), code,
msg, hdrs, fp) urllib2.HTTPError: HTTP
Error 401: Authorization Required
Now, the interesting part is when I monitor the tcp traffic on the computer using ngrep:
ngrep host 74.125.224.49 interface:
wlan0 (192.168.1.0/255.255.255.0)
filter: (ip) and ( host 74.125.224.49
)
#### T 192.168.1.74:34366 -74.125.224.49:80 [AP] GET /content.html
HTTP/1.1..Accept-Encoding:
identity..Host:
authenticationsite.com..Connection:
close..User-Agent:
Python-urllib/2.7....
## T 74.125.224.49:80 -192.168.1.74:34366 [AP] HTTP/1.1 401 Authorization Required..Date: Sun, 27
Feb 2011 03:39:31 GMT..Server:
Apache/2.2.3 (Red
Hat)..WWW-Authenticate: Digest
realm="protected",
nonce="6NSgTzudBAA=ac585d1f7ae0632c4b90324aff5e39e0f1fc25
05", algorithm=MD5,
qop="auth"..Content-Length:
486..Connection: close..Content-Type: text/html;
charset=iso-8859-1....<!DOCTYPE HTML
PUBLIC "-//IETF//DTD HTML
2.0//EN">.<html><head>.<title>401 Authorization
Required</title>.</head><body>.<h1>Authorization
Required</h1>.<p>This server could not
verify that you.are authorized to
access the document.requested. Either
you supplied the wrong.credentials
(e.g., badpassword), or
your.browser doesn't understand how to
supply.the credentials
required.</p>.<hr>.<address>Apache/2.2.3
(Red Hat) Server at
authenticationsite.com Port
80</address>.</body></html>.
####
It appears as though urllib2 is throwing that exception without even attempting to supply the credentials after getting the initial 401 error.
For comparion, here is the output of ngrep when I do the authentication in a web browser instead:
ngrep host 74.125.224.49 interface:
wlan0 (192.168.1.0/255.255.255.0)
filter: (ip) and ( host 74.125.224.49
)
#### T 192.168.1.74:36102 -74.125.224.49:80 [AP] GET /content.html HTTP/1.1..Host:
authenticationsite.com..User-Agent:
Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.9.2.12) Gecko/20101027
Firefox/3.6.12..Accept: text
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language:
en-us,en;q=0.5..Accept-Encoding:
gzip,deflate..Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive:
115..Connection: keep- alive....
## T 74.125.224.49:80 -192.168.1.74:36102 [AP] HTTP/1.1 401 Authorization Required..Date: Sun, 27
Feb 2011 03:43:42 GMT..Server:
Apache/2.2.3 (Red
Hat)..WWW-Authenticate: Digest
realm="protected",
nonce="rKCfXjudBAA=0c1111321169e30f689520321dbcce37a1876b
be", algorithm=MD5,
qop="auth"..Content-Length:
486..Connection: close..Content-Type: text/html;
charset=iso-8859-1....<!DOCTYPE HTML
PUBLIC "-//IETF//DTD HTML
2.0//EN">.<html><head>.<title>401 Authorization
Required</title>.</head><body>.<h1>Authorization
Required</h1>.<p>This server could not
verify that you.are authorized to
access the document.requested. Either
you supplied the wrong.credentials
(e.g., badpassword), or
your.browser doesn't understand how to
supply.the credentials
required.</p>.<hr>.<address>Apache/2.2.3
(Red Hat) Server at
authenticationsite.com Port
80</address>.</body></html>.
######### T 192.168.1.74:36103 -74.125.224.49:80 [AP] GET /content.html HTTP/1.1..Host:
authenticationsite.com..User-Agent:
Mozilla/5.0 (X11; U; Linux i686;
en-US; rv:1.9.2.12) Gecko/20101027
Firefox/3.6.12..Accept: text
/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept-Language:
en-us,en;q=0.5..Accept-Encoding:
gzip,deflate..Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7..Keep-Alive:
115..Connection: keep- alive..Authorization: Digest
username="protected",
realm="protected",
nonce="rKCfXjudBAA=0c1111199162342689520550dbcce37a1876bbe",
uri="/content.html", algorithm= MD5,
response="3b65dadaa00e1d6a1892ffff49f9f325",
qop=auth, nc=00000001,
cnonce="7636125b7fde3d1b"....
##
And then followed with the content of the site.
I've been playing around with this for a while and am not able to figure out what I'm doing wrong. I would be very thankful if somebody can help me out!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为这是由以下原因引起的:
资源似乎是通过摘要而不是基本进行身份验证的。这意味着您应该使用 urllib2.HTTPDigestAuthHandler 代替。
代码可能是
I think that's caused by this:
It appears the resource is authenticated with Digest rather than Basic. Which means you should use urllib2.HTTPDigestAuthHandler instead.
The code might be
您必须为此使用 python NTLM 模块:
from ntlm import HTTPNtlmAuthHandler
import urllib2
user = "Your_username"
password = "your_Passwrd"
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://your_Home_location/", 用户名,密码)
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)
url = "< a href="http://Your_home_location/sub_locations" rel="nofollow">http://Your_home_location/sub_locations"
response = urllib2.urlopen(url)
headers = response.info()
print("headers: {}".format(headers))
body = response.read()
print("响应:" + body)
you have to use python NTLM module for this :
from ntlm import HTTPNtlmAuthHandler
import urllib2
user = "Your_username"
password = "your_Passwrd"
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://your_Home_location/", user, password)
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
opener = urllib2.build_opener(auth_NTLM)
urllib2.install_opener(opener)
url = "http://Your_home_location/sub_locations"
response = urllib2.urlopen(url)
headers = response.info()
print("headers: {}".format(headers))
body = response.read()
print("response: " + body)
-- http://docs.python.org/library/urllib2.html#examples
-- http://docs.python.org/library/urllib2.html#examples