在 Python 中检查 HTTP POST 标头而不下载正文
Web 服务器使用要下载的文件响应 POST 请求(具有 Content-Disposition 标头)。使用 urllib 或 mechanize opener 在什么时候下载响应正文?
opener = mechanize.build_opener(HTTPRefererProcessor, HTTPEquivProcessor, HTTPRefreshProcessor)
r = make_post_request() # makes Request object to send
res = opener.open(r)
info = response.info()
content_disp = info.getheader('content-disposition')
filename = content_disp.split('=')[1]
content = res.read() # or skip based on filename
我的印象是正文在 read() 之前不会下载,这对于跳过某些下载(例如已下载的文件)很有用,但我没有看到性能有很大的提高。
A web server responds to a POST request with a file to download (has Content-Disposition header). Using urllib or mechanize opener at what point will the response body be downloaded?
opener = mechanize.build_opener(HTTPRefererProcessor, HTTPEquivProcessor, HTTPRefreshProcessor)
r = make_post_request() # makes Request object to send
res = opener.open(r)
info = response.info()
content_disp = info.getheader('content-disposition')
filename = content_disp.split('=')[1]
content = res.read() # or skip based on filename
I was under the impression that the body won't download until read(), which would be useful for skipping certain download (such as files already downloaded) but I am not seeing great deal of performance improvement.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
HTTP 是一种无连接协议,这意味着没有建立通道,服务器可以在其中分几步写入数据。因此,如果将 POST 或 GET 请求发送到服务器,它必须以完整的响应进行响应,因为它无法知道这是第一个还是第二个请求。 Cookie、AJAX、Comet 有助于模拟通道之类的东西,但目前还没有。这就是为什么有 HEAD 请求:通过这个请求,浏览器可以确定是否必须加载资源。
HTTP is a connection-less protocol, meaning that there is no channel established, in which a server could write data in several steps. So If a POST or a GET request is send to a sever, it MUST responds with a complete response, as it can't know, if itwas the 1st or 2nd request. Cookies, AJAX, Comet helps to emulate something like a channel, but there isn't one. Thats why there is the HEAD request: With this the browser can determine, if a resource must be loaded or not.
好吧,当您只需要标头时,您应该使用 HTTP HEAD。根据定义,POST 和 GET 将返回内容。
在停止下载方面,Web服务器不会等待开始向您发送数据,从Python到您的网卡的所有内容都将立即开始接收和缓冲数据。
因此,最好的选择是找到一种更好的方法来做到这一点——例如 HTTP HEAD。如果这不是一个选项,请在获取所需的标头后立即对请求对象调用 close() ,并希望不会浪费太多带宽。
(有关在 Python 中使用 HTTP HEAD 的示例,看到不久前的这个答案。)
Well, when you just want headers, you should be using HTTP HEAD. POST and GET will by definition return content.
In terms of stopping the download, the web server won't wait to start sending you data, and everything from Python to your network card will start receiving and buffering the data immediately.
So your best bet is to find a better way of doing this -- HTTP HEAD for example. If that's not an option, call close() on your request object immediately after getting whatever headers you need and hope you didn't waste too much bandwidth.
(And for an example on using HTTP HEAD in Python, see this answer from a while ago.)