如何从 urlib2 请求中获取完整的标头信息?

发布于 2024-11-29 18:24:22 字数 1972 浏览 0 评论 0原文

我正在使用 python urllib2 库来打开 URL,我想要的是获取请求的完整标头信息。当我使用response.info时,我只得到这个:

Date: Mon, 15 Aug 2011 12:00:42 GMT
Server: Apache/2.2.0 (Unix)
Last-Modified: Tue, 01 May 2001 18:40:33 GMT
ETag: "13ef600-141-897e4a40"
Accept-Ranges: bytes
Content-Length: 321
Connection: close
Content-Type: text/html

我期待live_http_headers(firefox的附加组件)给出的完整信息,例如:

http://www.yellowpages.com.mt/Malta-Web/127151.aspx

GET /Malta-Web/127151.aspx HTTP/1.1
Host: www.yellowpages.com.mt
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=156587571.1883941323.1313405289.1313405289.1313405289.1;    __utmz=156587571.1313405289.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)

HTTP/1.1 302 Found
Connection: Keep-Alive
Content-Length: 141
Date: Mon, 15 Aug 2011 12:17:25 GMT
Location: http://www.trucks.com.mt
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET, UrlRewriter.NET 2.0.0
X-AspNet-Version: 2.0.50727
Set-Cookie: ASP.NET_SessionId=zhnqh5554omyti55dxbvmf55; path=/; HttpOnly
Cache-Control: private

我的请求功能是:

def dorequest(url, post=None, headers={}):
    cOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
    urllib2.install_opener( cOpener )
    if post:
        post = urllib.urlencode(post)
    req = urllib2.Request(url, post, headers)
    response   = cOpener.open(req)
    print response.info()  // this does not give complete header info, how can i get complete header info??
    return response.read()
 url = 'http://www.yellowpages.com.mt/Malta-Web/127151.aspx'
 html = dorequest(url)

是否有可能实现所需的使用 urllib2 获取标头信息详细信息?我不想切换到 httplib。

I am using the python urllib2 library for opening URL, and what I want is to get the complete header info of the request. When I use response.info I only get this:

Date: Mon, 15 Aug 2011 12:00:42 GMT
Server: Apache/2.2.0 (Unix)
Last-Modified: Tue, 01 May 2001 18:40:33 GMT
ETag: "13ef600-141-897e4a40"
Accept-Ranges: bytes
Content-Length: 321
Connection: close
Content-Type: text/html

I am expecting the complete info as given by live_http_headers (add-on for firefox), e.g:

http://www.yellowpages.com.mt/Malta-Web/127151.aspx

GET /Malta-Web/127151.aspx HTTP/1.1
Host: www.yellowpages.com.mt
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=156587571.1883941323.1313405289.1313405289.1313405289.1;    __utmz=156587571.1313405289.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)

HTTP/1.1 302 Found
Connection: Keep-Alive
Content-Length: 141
Date: Mon, 15 Aug 2011 12:17:25 GMT
Location: http://www.trucks.com.mt
Content-Type: text/html; charset=utf-8
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET, UrlRewriter.NET 2.0.0
X-AspNet-Version: 2.0.50727
Set-Cookie: ASP.NET_SessionId=zhnqh5554omyti55dxbvmf55; path=/; HttpOnly
Cache-Control: private

My request function is:

def dorequest(url, post=None, headers={}):
    cOpener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookielib.CookieJar()))
    urllib2.install_opener( cOpener )
    if post:
        post = urllib.urlencode(post)
    req = urllib2.Request(url, post, headers)
    response   = cOpener.open(req)
    print response.info()  // this does not give complete header info, how can i get complete header info??
    return response.read()
 url = 'http://www.yellowpages.com.mt/Malta-Web/127151.aspx'
 html = dorequest(url)

Is it possible to achieve the desired header info details by using urllib2? I don't want to switch to httplib.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

沦落红尘 2024-12-06 18:24:22

这些是您使用 urllib2 发出请求时服务器发送的所有标头。

Firefox 也会向您显示它发送到服务器的标头。

当服务器从 Firefox 获取这些标头时,其中一些标头可能会触发它发送回其他标头,因此您最终也会得到更多响应标头。

复制 Firefox 发送的确切标头,您将得到相同的响应。

编辑: location 标头是由执行重定向的页面发送的,而不是您重定向到的页面。只需使用 response.url 即可获取您发送到的页面的位置。

第一个 URL 使用 302 重定向。如果您不想遵循重定向,而是查看第一页的标题,请使用 URLOpener 而不是 FancyURLOpener,它会自动遵循重定向。

Those are all of the headers the server is sending when you do the request with urllib2.

Firefox is showing you the headers it's sending to the server as well.

When the server gets those headers from Firefox, some of them may trigger it to send back additional headers, so you end up with more response headers as well.

Duplicate the exact headers Firefox sends, and you'll get back an identical response.

Edit: That location header is sent by the page that does the redirect, not the page you're redirected to. Just use response.url to get the location of the page you've been sent to.

That first URL uses a 302 redirect. If you don't want to follow the redirect, but see the headers from the first page instead, use a URLOpener instead of a FancyURLOpener, which automatically follows redirects.

小清晰的声音 2024-12-06 18:24:22

我看到服务器返回 HTTP/1.1 302 Found - HTTP 重定向。

urllib 自动遵循重定向,因此 urllib 返回的标头是来自 http://www.trucks.com.mt,而不是 http://www.yellowpages.com.mt/Malta-Web/127151.aspx

I see that server returns HTTP/1.1 302 Found - HTTP redirect.

urllib automatically follow redirects, so headers returned by urllib is headers from http://www.trucks.com.mt, not http://www.yellowpages.com.mt/Malta-Web/127151.aspx

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文