是否可以使用 Linux 命令从 HTTP 服务器仅读取前 N 个字节?
给定 URL http://www.example.com,我们可以读取第一个页内有 N 个字节?
使用wget,我们可以下载整个页面。
使用curl,有-r,0-499指定前500个字节。看来问题已经解决了。
<块引用>您还应该注意,许多 HTTP/1.1 服务器没有启用此功能,因此当您尝试获取范围时,您将获得整个文档。
在Python中使用urlib。类似的问题这里,但是根据康斯坦丁的评论,这是真的吗?
<块引用>上次我尝试这种技术时失败了,因为实际上不可能从 HTTP 服务器读取指定数量的数据,即您隐式读取所有 HTTP 响应,然后才从中读取前 N 个字节。所以最后你下载了整个 1Gb 的恶意响应。
那么,实际中我们如何从HTTP服务器读取前N个字节呢?
Given the url http://www.example.com, can we read the first N bytes out of the page?
using wget, we can download the whole page.
using curl, there is -r, 0-499 specifies the first 500 bytes. Seems solve the problem.
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
using urlib in Python. Similar question here, but according to Konstantin's comment, is that really true?
Last time I tried this technique it failed because it was actually impossible to read from the HTTP server only specified amount of data, i.e. you implicitly read all HTTP response and only then read first N bytes out of it. So at the end you ended up downloading the whole 1Gb malicious response.
So, how can we read the first N bytes from the HTTP server in practice?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以通过以下curl命令本地完成此操作(无需下载整个文档)。根据curl手册页:
即使将 Java Web 应用程序部署到 GigaSpaces,它也适用于我。
You can do it natively by the following curl command (no need to download the whole document). According to the curl man page:
It works for me even with a Java web app deployed to GigaSpaces.
或
应该做的
还有更简单的实用程序,可能具有边界可用性,例如
Or
or
should do
Also there are simpler utils with perhaps borader availability like
Or
无论如何,您都必须获取整个网络,因此您可以使用卷曲获取网络并将其通过管道传输到头部。
You will have to get the whole web anyways, so you can get the web with curl and pipe it to head, for example.
我来到这里寻找一种方法来计算服务器的处理时间,我认为我可以通过告诉curl 在 1 个字节或其他内容后停止下载来测量时间。
对我来说,更好的解决方案是执行 HEAD 请求,因为这通常会让服务器正常处理请求,但不会返回任何响应正文:
I came here looking for a way to time the server's processing time, which I thought I could measure by telling curl to stop downloading after 1 byte or something.
For me, the better solution turned out to be to do a HEAD request, since this usually lets the server process the request as normal but does not return any response body:
建立套接字连接。读取你想要的字节。关闭,你就完成了。
Make a socket connection. Read the bytes you want. Close, and you're done.