Python urllib2 响应头

发布于 2024-08-09 01:44:38 字数 346 浏览 7 评论 0原文

我正在尝试提取 URL 请求的响应标头。当我使用 firebug 分析 URL 请求的响应输出时，它返回：

Content-Type text/html

但是，当我使用 python 代码时：

urllib2.urlopen(URL).info()

结果输出返回：

Content-Type: video/x-flv

我是 python 新手，一般来说也是 Web 编程新手；非常感谢任何有用的见解。另外，如果需要更多信息，请告诉我。

预先感谢您阅读这篇文章

原文

I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:

Content-Type text/html

However when I use the python code:

urllib2.urlopen(URL).info()

the resulting output returns:

Content-Type: video/x-flv

I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.

Thanks in advance for reading this post

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦开始←不甜 2024-08-16 01:44:38

尝试像 Firefox 那样请求。您可以在 Firebug 中看到请求标头，因此将它们添加到您的请求对象中：

import urllib2

request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')

还有 HTTPCookieProcessor 可以使其变得更好，但我认为在大多数情况下您不需要它。看一下 python 的文档：

http://docs.python.org/library/urllib2.html< /a>

Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:

import urllib2

request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')

There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:

http://docs.python.org/library/urllib2.html

回复收藏 0 原文

绝不服输 2024-08-16 01:44:38

这种特殊的差异可能是由两个请求发送的不同标头（可能是接受类型的标头）来解释——你能检查一下吗……？或者，如果 Javascript 在 Firefox 中运行（我假设您在运行 firebug 时使用的是 Firefox？）——因为它绝对不能在 Python 中运行——正如他们所说，“一切皆有可能”；-) 。

回复收藏 0 原文

隱形的亼 2024-08-16 01:44:38

请记住，Web 服务器可能会根据请求的差异为同一 URL 返回不同的结果。例如，内容类型协商：请求者可以指定它将接受的内容类型列表，并且服务器可以返回不同的结果以尝试满足不同的需求。

此外，您可能会收到某个请求的错误页面，例如，因为它格式错误，或者您没有设置可以正确验证您身份的 cookie，等等。请查看响应本身以了解您收到的内容。

回复收藏 0 原文

趁年轻赶紧闹 2024-08-16 01:44:38

内容类型text/html

真的像这样，没有冒号吗？

如果是这样，这可能会解释它：它是一个无效的标头，因此它会被忽略，因此 urllib 通过查看文件名来猜测内容类型。如果 URL 末尾恰好有“.flv”，它会猜测类型应该是 video/x-flv。

回复收藏 0 原文

朮生 2024-08-16 01:44:38

根据 http://docs.python.org/library/urllib2.html 有只有 get_header() 方法，没有关于 getheader 的内容。

询问是因为您的代码工作正常，

response.info().getheader('Set cookie')

但一旦我执行，

response.info().get_header('Set cookie')

我得到：

Traceback (most recent call last):
  File "baza.py", line 11, in <module>
    cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'

编辑：
而且
response.headers.get('Set-Cookie') 也可以正常工作，urlib2 文档中没有提到......

according to http://docs.python.org/library/urllib2.html there is only get_header() method and nothing about getheader .

Asking because Your code works fine for

response.info().getheader('Set cookie')

but once i execute

response.info().get_header('Set cookie')

i get:

Traceback (most recent call last):
  File "baza.py", line 11, in <module>
    cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'

edit:
Moreover
response.headers.get('Set-Cookie') works fine as well, not mentioned in urlib2 doc....

回复收藏 0 原文

且行且努力 2024-08-16 01:44:38

为了获取 python2 中标头的原始数据，有点破解，但它有效。

"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])

基本上 "".join(list) 将是标题列表，其末尾都包含“\n”。

__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.

当然，["headers"] 正在从 .info() 响应值字典中选择列表值，

希望这可以帮助您学习一些 ez python 技巧:)

for getting raw data for the headers in python2, a little bit of a hack but it works.

"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])

basically "".join(list) will the list of headers, which all include "\n" at the end.

__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.

and ofcourse ["headers"] is selecting the list value from the .info() response value dict

hope this helped you learn a few ez python tricks :)

回复收藏 0 原文

~没有更多了~

关于作者

呆萌少年

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

Python urllib2 响应头

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

Python urllib2 响应头

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。