Python urllib2 响应头
我正在尝试提取 URL 请求的响应标头。当我使用 firebug 分析 URL 请求的响应输出时,它返回:
Content-Type text/html
但是,当我使用 python 代码时:
urllib2.urlopen(URL).info()
结果输出返回:
Content-Type: video/x-flv
我是 python 新手,一般来说也是 Web 编程新手;非常感谢任何有用的见解。另外,如果需要更多信息,请告诉我。
预先感谢您阅读这篇文章
I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:
Content-Type text/html
However when I use the python code:
urllib2.urlopen(URL).info()
the resulting output returns:
Content-Type: video/x-flv
I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.
Thanks in advance for reading this post
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
尝试像 Firefox 那样请求。您可以在 Firebug 中看到请求标头,因此将它们添加到您的请求对象中:
还有 HTTPCookieProcessor 可以使其变得更好,但我认为在大多数情况下您不需要它。看一下 python 的文档:
http://docs.python.org/library/urllib2.html< /a>
Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:
There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:
http://docs.python.org/library/urllib2.html
这种特殊的差异可能是由两个请求发送的不同标头(可能是接受类型的标头)来解释——你能检查一下吗……?或者,如果 Javascript 在 Firefox 中运行(我假设您在运行 firebug 时使用的是 Firefox?)——因为它绝对不能在 Python 中运行——正如他们所说,“一切皆有可能”;-) 。
This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).
请记住,Web 服务器可能会根据请求的差异为同一 URL 返回不同的结果。例如,内容类型协商:请求者可以指定它将接受的内容类型列表,并且服务器可以返回不同的结果以尝试满足不同的需求。
此外,您可能会收到某个请求的错误页面,例如,因为它格式错误,或者您没有设置可以正确验证您身份的 cookie,等等。请查看响应本身以了解您收到的内容。
Keep in mind that a web server can return different results for the same URL based on differences in the request. For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.
Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.
真的像这样,没有冒号吗?
如果是这样,这可能会解释它:它是一个无效的标头,因此它会被忽略,因此 urllib 通过查看文件名来猜测内容类型。如果 URL 末尾恰好有“.flv”,它会猜测类型应该是
video/x-flv
。Really, like that, without the colon?
If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be
video/x-flv
.根据 http://docs.python.org/library/urllib2.html 有只有
get_header()
方法,没有关于getheader
的内容。询问是因为您的代码工作正常,
但一旦我执行,
我得到:
编辑:
而且
response.headers.get('Set-Cookie')
也可以正常工作,urlib2 文档中没有提到......according to http://docs.python.org/library/urllib2.html there is only
get_header()
method and nothing aboutgetheader
.Asking because Your code works fine for
but once i execute
i get:
edit:
Moreover
response.headers.get('Set-Cookie')
works fine as well, not mentioned in urlib2 doc....为了获取 python2 中标头的原始数据,有点破解,但它有效。
基本上
"".join(list)
将是标题列表,其末尾都包含“\n”。当然,
["headers"]
正在从.info()
响应值字典中选择列表值,希望这可以帮助您学习一些 ez python 技巧:)
for getting raw data for the headers in python2, a little bit of a hack but it works.
basically
"".join(list)
will the list of headers, which all include "\n" at the end.and ofcourse
["headers"]
is selecting the list value from the.info()
response value dicthope this helped you learn a few ez python tricks :)