302s 和 urllib2 丢失 cookie
我将 liburl2 与 CookieJar / HTTPCookieProcessor 一起使用,试图模拟登录页面以自动上传。
我已经看到了一些关于此的问题和答案,但没有解决我的问题。当我模拟以 302 重定向结束的登录时,我丢失了 cookie。 302 响应是服务器设置 cookie 的地方,但 urllib2 HTTPCookieProcessor 似乎在重定向期间没有保存 cookie。我尝试创建一个 HTTPRedirectHandler 类来忽略重定向,但这似乎没有成功。我尝试全局引用 CookieJar 来处理来自 HTTPRedirectHandler 的 cookie,但是 1. 这不起作用(因为我正在处理来自重定向器的标头,并且我正在使用的 CookieJar 函数 extract_cookies 需要完整的请求)并且2. 这是一种丑陋的处理方式。
我可能需要一些关于这方面的指导,因为我对 Python 还很陌生。我想我在这里主要是在吠叫正确的树,但也许关注的是错误的分支。
cj = cookielib.CookieJar()
cookieprocessor = urllib2.HTTPCookieProcessor(cj)
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
global cj
cookie = headers.get("set-cookie")
if cookie:
# Doesn't work, but you get the idea
cj.extract_cookies(headers, req)
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor(cj)
# Oh yeah. I'm using a proxy too, to follow traffic.
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor, proxy)
另外:我也尝试过使用mechanize,但没有成功。这可能是一个新问题,但我会在这里提出它,因为它是相同的最终目标:
这个使用 mechanize 的简单代码,当与 302 发射 url 一起使用时(http://fxfeeds.mozilla.com/firefox/headlines)。 xml) - 请注意,不使用 set_handle_robots(False) 时会发生相同的行为。我只是想确保不是这样:
import urllib2, mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
opener = mechanize.build_opener(*(browser.handlers))
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
输出:
Traceback (most recent call last):
File "redirecttester.py", line 6, in <module>
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 204, in open
File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 457, in http_response
File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 221, in error
File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 571, in http_error_302
File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 188, in open
File "build/bdist.macosx-10.6-universal/egg/mechanize/_mechanize.py", line 71, in http_request
AttributeError: OpenerDirector instance has no attribute '_add_referer_header'
有什么想法吗?
I am using liburl2 with CookieJar / HTTPCookieProcessor in an attempt to simulate a login to a page to automate an upload.
I've seen some questions and answers on this, but nothing which solves my problem. I am losing my cookie when I simulate the login which ends up at a 302 redirect. The 302 response is where the cookie gets set by the server, but urllib2 HTTPCookieProcessor does not seem to save the cookie during a redirect. I tried creating a HTTPRedirectHandler class to ignore the redirect, but that didn't seem to do the trick. I tried referencing the CookieJar globally to handle the cookies from the HTTPRedirectHandler, but 1. This didn't work (because I was handling the header from the redirector, and the CookieJar function that I was using, extract_cookies, needed a full request) and 2. It's an ugly way to handle it.
I probably need some guidance on this as I'm fairly green with Python. I think I'm mostly barking up the right tree here, but maybe focusing on the wrong branch.
cj = cookielib.CookieJar()
cookieprocessor = urllib2.HTTPCookieProcessor(cj)
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
global cj
cookie = headers.get("set-cookie")
if cookie:
# Doesn't work, but you get the idea
cj.extract_cookies(headers, req)
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor(cj)
# Oh yeah. I'm using a proxy too, to follow traffic.
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor, proxy)
Addition: I had tried using mechanize as well, without success. This is probably a new question, but I'll pose it here since it is the same ultimate goal:
This simple code using mechanize, when used with a 302 emitting url (http://fxfeeds.mozilla.com/firefox/headlines.xml) -- note that the same behavior occurs when not using set_handle_robots(False). I just wanted to ensure that wasn't it:
import urllib2, mechanize
browser = mechanize.Browser()
browser.set_handle_robots(False)
opener = mechanize.build_opener(*(browser.handlers))
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
Output:
Traceback (most recent call last):
File "redirecttester.py", line 6, in <module>
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 204, in open
File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 457, in http_response
File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 221, in error
File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 571, in http_error_302
File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 188, in open
File "build/bdist.macosx-10.6-universal/egg/mechanize/_mechanize.py", line 71, in http_request
AttributeError: OpenerDirector instance has no attribute '_add_referer_header'
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我最近遇到了完全相同的问题,但出于时间考虑,我放弃了它并决定使用
mechanize
。它可以用作urllib2
的完全替代品,其行为与您期望浏览器在 Referer 标头、重定向和 cookie 方面的行为完全一样。Browser
对象本身可以用作打开器(使用.open()
方法)。它在内部维护状态,但还在每次调用时返回一个响应对象。所以你可以获得很大的灵活性。另外,如果您不需要手动检查 cookiejar 或将其传递给其他东西,您也可以省略该对象的显式创建和分配。
我完全意识到这并不能解决真正发生的问题以及为什么 urllib2 无法提供开箱即用的解决方案或至少无需进行大量调整,但如果您缺乏时间,只是想让它发挥作用,只需使用机械化即可。
I have been having the exact same problem recently but in the interest of time scrapped it and decided to go with
mechanize
. It can be used as a total replacement forurllib2
that behaves exactly as you would expect a browser to behave with regards to Referer headers, redirects, and cookies.The
Browser
object can be used as an opener itself (using the.open()
method). It maintains state internally but also returns a response object on every call. So you get a lot of flexibility.Also, if you don't have a need to inspect the
cookiejar
manually or pass it along to something else, you can omit the explicit creation and assignment of that object as well.I am fully aware this doesn't address what is really going on and why
urllib2
can't provide this solution out of the box or at least without a lot of tweaking, but if you're short on time and just want it to work, just use mechanize.取决于重定向的完成方式。如果它是通过 HTTP 刷新完成的,那么 mechanize 有一个可供您使用的 HTTPRefreshProcessor。
尝试创建一个像这样的开场白:
Depends on how the redirect is done. If it's done via a HTTP Refresh, then mechanize has a HTTPRefreshProcessor you can use.
Try to create an opener like this:
我刚刚得到了以下内容的变体,至少在尝试从 http://www.fudzilla.com/home?format=feed&type=atom
我无法验证下面的代码片段是否会按原样运行,但可能会给您一个开始:
I've just got a variation of the below working for me, at least when trying to read Atom from http://www.fudzilla.com/home?format=feed&type=atom
I can't verify that the below snippet will run as-is, but might give you a start:
我也遇到了同样的问题,服务器将使用 302 和 Set-Cookie 标头中的会话令牌响应登录 POST 请求。使用 Wireshark 可以清楚地看到 urllib 遵循重定向,但不将会话令牌包含在 Cookie 中。
我实际上只是撕掉 urllib 并直接替换为 requests 并且它首先工作得很好时间而无需改变任何东西。对那些家伙来说是很大的支持。
I was also having the same problem where the server would respond to the login POST request with a 302 and the session token in the Set-Cookie header. Using Wireshark it was clearly visible that urllib was following the redirect but not including the session token in the Cookie.
I literally just ripped out urllib and did a direct replacement with requests and it worked perfectly first time without having to change a thing. Big props to those guys.