urlfetch重定向到python中的无限循环
我正在尝试加载一个重定向到自身的网址。我假设它正在加载 cookie 并寻找它,但它从未看到它,所以存在无限循环的请求。
我尝试过 urllib2、urlfetch 和 httplib2。没有工作。
我尝试了这个:
url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
thing = urllib2.HTTPRedirectHandler()
thing2 = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(thing, thing2)
url = 'http://www.nytimes.com/2005/10/26/business/26fed.html?pagewanted=print'
page = opener.open(url)
这在 shell 中有效,但在 Google App Engine 上无效。在 urlfetch 的文档中: http://code.google.com/appengine/docs/python/urlfetch follow_redirects 下的/fetchfunction.html
表示: “重定向时不会处理 Cookie。如果需要 Cookie 处理,请将 follow_redirects 设置为 False 并手动处理 Cookie 和重定向。”
我不知道如何执行此操作,并且文档似乎也没有提供任何线索。
我用谷歌搜索了这个问题,没有任何报告的问题可以解决我的问题。
I am trying to load a url which redirects to itself. I'm assuming its loading a cookie and its looking for it but it never sees it so there is this infinite loop of requests.
I have tried urllib2, urlfetch, and httplib2. None work.
I tried this though:
url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
thing = urllib2.HTTPRedirectHandler()
thing2 = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(thing, thing2)
url = 'http://www.nytimes.com/2005/10/26/business/26fed.html?pagewanted=print'
page = opener.open(url)
This works in shell, but not on the Google App Engine. In the documentation for urlfetch:
http://code.google.com/appengine/docs/python/urlfetch/fetchfunction.html
under follow_redirects, it says:
"Cookies are not handled upon redirection. If cookie handling is needed, set follow_redirects to False and handle both cookies and redirects manually."
I have no idea how to do this and the documentation doesn't seem to give any clues either.
I googled the hell out of this issue and there are NO reported issues like this that work for my problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
多一点解释。很高兴至少网站的行为得到了解释:它需要一些 cookie,如果未设置 cookie,它会使用 cookie 设置标头重定向到自身。您可能应该阅读 cookie 的工作原理;网站使用 Set-Cookie 标头发送 cookie,浏览器必须在 Cookie 标头中回显它(有一些变化)。 Python 有一个用于管理 cookie 集合的库 cookielib 可以帮助您完成此任务。
最好使用原生的 urlfetch API;它的返回对象有一个 headers 对象,它是一个给出所有标头(例如 Set-Cookie 标头)的字典。要发送特定标头,请使用 urlfetch.fetch() 函数的 headers 参数。在这里,您将使用 Cookie 标头(但请记住,您设置的 Cookie 标头的格式与您收到的 Set-Cookie 标头的格式不同——这就是 cookielib 的用武之地。
祝您好运!
PS. 使用curl -v很容易看出该站点实际上发送了三个不同的 Set-Cookie 标头,您可能必须处理所有三个。
A little more explanation. Glad that at least the website's behavior is explained: it wants some cookie, and if the cookie isn't set it redirects to itself with a cookie-setting header. You should probably read up on how cookies work; the website sends the cookie using a Set-Cookie header, and the browser must echo it back (with some variations) in a Cookie header. Python has a library for managing collections of cookies, cookielib to help you with this.
It's best to use the native urlfetch API; its return object has a headers object which is a dict giving all the headers (e.g. the Set-Cookie header). To send specific headers, use the headers argument to the urlfetch.fetch() function. Here you will use the Cookie header (but remember that the format of the Cookie header you set is not the same as that of the Set-Cookie header you receive -- that's where cookielib comes in.
Good luck!
PS. Using curl -v it's easy to see that the site actually sends three different Set-Cookie headers. You probably have to deal with all three.