Http协议、Content-Length、获取页面内容Python
我正在尝试编写自己的 Python 3 http 库,以了解有关套接字和 Http 协议的更多信息。我的问题是,如果使用我的套接字执行recv(bytesToRead),我怎样才能只获取标题,然后使用内容长度信息,继续接收页面内容?这不是 Content-Length 标头的目的吗? 提前致谢
I'm trying to code my own Python 3 http library to learn more about sockets and the Http protocol. My question is, if a do a recv(bytesToRead) using my socket, how can I get only the header and then with the Content-Length information, continue recieving the page content? Isn't that the purpose of the Content-Length header?
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在过去,为了完成此任务,我会将套接字数据的一部分读入内存,然后从该缓冲区中读取,直到遇到“\r\n\r\n”序列(您可以使用状态机来执行此操作)或者简单地使用 string.find() 函数。一旦到达该序列,您就知道所有标题都已被读取,并且您可以对标题进行一些解析,然后读取整个内容长度。您可能需要准备阅读。不包含内容长度标头的响应,因为并非所有响应都包含它。
如果您在看到该序列之前耗尽了缓冲区,只需将更多数据从套接字读取到缓冲区中并继续处理(
如果您想查看)。
In the past to accomplish this, I will read a portion of the socket data into memory, and then read from that buffer until a "\r\n\r\n" sequence is encountered (you could use a state machine to do this or simply use the string.find() function. Once you reach that sequence you know all of the headers have been read and you can do some parsing of the headers and then read the entire content length. You may need to be prepared to read a response that does not include a content-length header since not all responses contain it.
If you run out of buffer before seeing that sequence, simply read more data from the socket into your buffer and continue processing.
I can post a C# example if you would like to look at it.