302s 和 urllib2 丢失 cookie

发布于 2024-10-30 23:41:58 字数 2666 浏览 6 评论 0原文

我将 liburl2 与 CookieJar / HTTPCookieProcessor 一起使用,试图模拟登录页面以自动上传。

我已经看到了一些关于此的问题和答案,但没有解决我的问题。当我模拟以 302 重定向结束的登录时,我丢失了 cookie。 302 响应是服务器设置 cookie 的地方,但 urllib2 HTTPCookieProcessor 似乎在重定向期间没有保存 cookie。我尝试创建一个 HTTPRedirectHandler 类来忽略重定向,但这似乎没有成功。我尝试全局引用 CookieJar 来处理来自 HTTPRedirectHandler 的 cookie,但是 1. 这不起作用(因为我正在处理来自重定向器的标头,并且我正在使用的 CookieJar 函数 extract_cookies 需要完整的请求)并且2. 这是一种丑陋的处理方式。

我可能需要一些关于这方面的指导,因为我对 Python 还很陌生。我想我在这里主要是在吠叫正确的树,但也许关注的是错误的分支。

cj = cookielib.CookieJar()
cookieprocessor = urllib2.HTTPCookieProcessor(cj)


class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
  def http_error_302(self, req, fp, code, msg, headers):
    global cj
    cookie = headers.get("set-cookie")
    if cookie:
      # Doesn't work, but you get the idea
      cj.extract_cookies(headers, req)

    return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

  http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor(cj)

# Oh yeah.  I'm using a proxy too, to follow traffic.
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor, proxy)

另外:我也尝试过使用mechanize,但没有成功。这可能是一个新问题,但我会在这里提出它,因为它是相同的最终目标:

这个使用 mechanize 的简单代码,当与 302 发射 url 一起使用时(http://fxfeeds.mozilla.com/firefox/headlines)。 xml) - 请注意,不使用 set_handle_robots(False) 时会发生相同的行为。我只是想确保不是这样:

import urllib2, mechanize

browser = mechanize.Browser()
browser.set_handle_robots(False)
opener = mechanize.build_opener(*(browser.handlers))
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")

输出:

Traceback (most recent call last):
  File "redirecttester.py", line 6, in <module>
    r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 204, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 457, in http_response
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 221, in error
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 571, in http_error_302
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 188, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_mechanize.py", line 71, in http_request
AttributeError: OpenerDirector instance has no attribute '_add_referer_header'

有什么想法吗?

I am using liburl2 with CookieJar / HTTPCookieProcessor in an attempt to simulate a login to a page to automate an upload.

I've seen some questions and answers on this, but nothing which solves my problem. I am losing my cookie when I simulate the login which ends up at a 302 redirect. The 302 response is where the cookie gets set by the server, but urllib2 HTTPCookieProcessor does not seem to save the cookie during a redirect. I tried creating a HTTPRedirectHandler class to ignore the redirect, but that didn't seem to do the trick. I tried referencing the CookieJar globally to handle the cookies from the HTTPRedirectHandler, but 1. This didn't work (because I was handling the header from the redirector, and the CookieJar function that I was using, extract_cookies, needed a full request) and 2. It's an ugly way to handle it.

I probably need some guidance on this as I'm fairly green with Python. I think I'm mostly barking up the right tree here, but maybe focusing on the wrong branch.

cj = cookielib.CookieJar()
cookieprocessor = urllib2.HTTPCookieProcessor(cj)


class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
  def http_error_302(self, req, fp, code, msg, headers):
    global cj
    cookie = headers.get("set-cookie")
    if cookie:
      # Doesn't work, but you get the idea
      cj.extract_cookies(headers, req)

    return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

  http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor(cj)

# Oh yeah.  I'm using a proxy too, to follow traffic.
proxy = urllib2.ProxyHandler({'http': '127.0.0.1:8888'})
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor, proxy)

Addition: I had tried using mechanize as well, without success. This is probably a new question, but I'll pose it here since it is the same ultimate goal:

This simple code using mechanize, when used with a 302 emitting url (http://fxfeeds.mozilla.com/firefox/headlines.xml) -- note that the same behavior occurs when not using set_handle_robots(False). I just wanted to ensure that wasn't it:

import urllib2, mechanize

browser = mechanize.Browser()
browser.set_handle_robots(False)
opener = mechanize.build_opener(*(browser.handlers))
r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")

Output:

Traceback (most recent call last):
  File "redirecttester.py", line 6, in <module>
    r = opener.open("http://fxfeeds.mozilla.com/firefox/headlines.xml")
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 204, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 457, in http_response
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 221, in error
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_urllib2_fork.py", line 571, in http_error_302
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_opener.py", line 188, in open
  File "build/bdist.macosx-10.6-universal/egg/mechanize/_mechanize.py", line 71, in http_request
AttributeError: OpenerDirector instance has no attribute '_add_referer_header'

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

热风软妹 2024-11-06 23:41:58

我最近遇到了完全相同的问题,但出于时间考虑,我放弃了它并决定使用 mechanize。它可以用作 urllib2 的完全替代品,其行为与您期望浏览器在 Referer 标头、重定向和 cookie 方面的行为完全一样。

import mechanize
cj = mechanize.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cj)
browser.set_proxies({'http': '127.0.0.1:8888'})

# Use browser's handlers to create a new opener
opener = mechanize.build_opener(*browser.handlers)

Browser 对象本身可以用作打开器(使用 .open() 方法)。它在内部维护状态,但还在每次调用时返回一个响应对象。所以你可以获得很大的灵活性。

另外,如果您不需要手动检查 cookiejar 或将其传递给其他东西,您也可以省略该对象的显式创建和分配。

我完全意识到这并不能解决真正发生的问题以及为什么 urllib2 无法提供开箱即用的解决方案或至少无需进行大量调整,但如果您缺乏时间,只是想让它发挥作用,只需使用机械化即可。

I have been having the exact same problem recently but in the interest of time scrapped it and decided to go with mechanize. It can be used as a total replacement for urllib2 that behaves exactly as you would expect a browser to behave with regards to Referer headers, redirects, and cookies.

import mechanize
cj = mechanize.CookieJar()
browser = mechanize.Browser()
browser.set_cookiejar(cj)
browser.set_proxies({'http': '127.0.0.1:8888'})

# Use browser's handlers to create a new opener
opener = mechanize.build_opener(*browser.handlers)

The Browser object can be used as an opener itself (using the .open() method). It maintains state internally but also returns a response object on every call. So you get a lot of flexibility.

Also, if you don't have a need to inspect the cookiejar manually or pass it along to something else, you can omit the explicit creation and assignment of that object as well.

I am fully aware this doesn't address what is really going on and why urllib2 can't provide this solution out of the box or at least without a lot of tweaking, but if you're short on time and just want it to work, just use mechanize.

吾家有女初长成 2024-11-06 23:41:58

取决于重定向的完成方式。如果它是通过 HTTP 刷新完成的,那么 mechanize 有一个可供您使用的 HTTPRefreshProcessor。
尝试创建一个像这样的开场白:

cj = mechanize.CookieJar()
opener = mechanize.build_opener(
    mechanize.HTTPCookieProcessor(cj),
    mechanize.HTTPRefererProcessor,
    mechanize.HTTPEquivProcessor,
    mechanize.HTTPRefreshProcessor)

Depends on how the redirect is done. If it's done via a HTTP Refresh, then mechanize has a HTTPRefreshProcessor you can use.
Try to create an opener like this:

cj = mechanize.CookieJar()
opener = mechanize.build_opener(
    mechanize.HTTPCookieProcessor(cj),
    mechanize.HTTPRefererProcessor,
    mechanize.HTTPEquivProcessor,
    mechanize.HTTPRefreshProcessor)
夏了南城 2024-11-06 23:41:58

我刚刚得到了以下内容的变体,至少在尝试从 http://www.fudzilla.com/home?format=feed&type=atom

我无法验证下面的代码片段是否会按原样运行,但可能会给您一个开始:

import cookielib
cookie_jar = cookielib.LWPCookieJar()
cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar)
handlers = [cookie_handler] #+others, we have proxy + progress handlers
opener = apply(urllib2.build_opener, tuple(handlers + [_FeedURLHandler()])) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2848 for implementation of _FeedURLHandler
opener.addheaders = [] #may not be needed but see the comments around the link referred to below
try:
    return opener.open(request) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2954 for implementation of request
finally:
    opener.close()

I've just got a variation of the below working for me, at least when trying to read Atom from http://www.fudzilla.com/home?format=feed&type=atom

I can't verify that the below snippet will run as-is, but might give you a start:

import cookielib
cookie_jar = cookielib.LWPCookieJar()
cookie_handler = urllib2.HTTPCookieProcessor(cookie_jar)
handlers = [cookie_handler] #+others, we have proxy + progress handlers
opener = apply(urllib2.build_opener, tuple(handlers + [_FeedURLHandler()])) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2848 for implementation of _FeedURLHandler
opener.addheaders = [] #may not be needed but see the comments around the link referred to below
try:
    return opener.open(request) #see http://code.google.com/p/feedparser/source/browse/trunk/feedparser/feedparser.py#2954 for implementation of request
finally:
    opener.close()
多孤肩上扛 2024-11-06 23:41:58

我也遇到了同样的问题,服务器将使用 302 和 Set-Cookie 标头中的会话令牌响应登录 POST 请求。使用 Wireshark 可以清楚地看到 urllib 遵循重定向,但不将会话令牌包含在 Cookie 中。

我实际上只是撕掉 urllib 并直接替换为 requests 并且它首先工作得很好时间而无需改变任何东西。对那些家伙来说是很大的支持。

I was also having the same problem where the server would respond to the login POST request with a 302 and the session token in the Set-Cookie header. Using Wireshark it was clearly visible that urllib was following the redirect but not including the session token in the Cookie.

I literally just ripped out urllib and did a direct replacement with requests and it worked perfectly first time without having to change a thing. Big props to those guys.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文