http请求消息边界
我正在编写一个客户端,通过常规 http multipart/form-data 将文件上传到 megaupload。现在,重点不是大型上传本身,而是他们的网络服务器的行为。
Curl 可以毫无问题地上传,而我的客户端却不能,即使发送完全相同的请求(用wireshark 嗅探)——但它一直在等待响应,最终在 30 分钟后超时。
在使用原始套接字和 strace 一段时间后,发现两者之间的唯一区别是,curl 仅通过一次对 sendto(2) 的调用发送标头块,然后通过对 sendto(2) 的其他调用发送其余部分。另一方面,我的客户端使用 write(2) 单独发送每个标头。
现在,如果 send 没有指定任何标志,则 sendto 和 write 应该是等效的,而事实确实如此。事实上,我使它可以与 write 一起使用,但只能通过在一次调用中发送标头块来实现。所有其他写入调用序列都会导致请求陷入等待状态。
所以问题是:这怎么可能? Tcp 不保留消息边界,它是一种流协议。
我唯一能想到的是,每个 write/send 系统调用都会导致发送一个数据包,并且远程服务器正在嗅探原始数据包并谎称是 apache。
有想法吗?或者我是个白痴,而这是兼容的 http 服务器的正常行为? 它肯定是第一个对我有这种行为的网络服务器。
I'm writing a client to upload files via regular http multipart/form-data to megaupload. Now, the point is not megaupload per se, but the behaviour of their webserver.
Curl could upload without any problem, while my client couldn't, even by sending the exact same request (sniffed with wireshark) -- but it was stuck waiting for the response, and eventually timing out after 30 minutes.
After playing with raw sockets and strace for a while, it turns out the only difference between the two is that curl sends the header block with only one call to sendto(2), and then the rest with other calls to sendto(2). My client, on the other hand, sends every header separately with a write(2).
Now, sendto and write should be equivalent, if send doesn't specify any flag, and it didn't. In fact I made it work with write, but only by sending the header block in a single call. Every other sequence of write calls caused the request to be stuck waiting.
So the question is: how is this even possible? Tcp doesn't preserve message boundaries, it being a stream protocol.
The only thing I can think of is that every write/send syscall causes a packet to be sent, and that the remote server is sniffing raw packets and lying about being apache.
Ideas? Or am I being a moron, and this is normal behaviour for a compliant http server?
It sure is the first webserver to behave that way to me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

http 协议包含客户端/服务器可以确定消息边界的机制。
对于上传的数据(POST、PUT),需要内容长度请求标头或分块编码。内容长度让服务器确切地知道从套接字接收多少字节。一旦收到这些字节,它就会向另一个方向发送。这实际上就是这里的消息边界。 Chunked-encoding 还告诉服务器有多少字节;只是分成几块。
对于响应,内容长度(或分块编码)可选。这也告诉客户端需要多少字节;这是持久连接正常工作所必需的。如果无法确定内容长度,服务器只需关闭套接字,然后客户端就知道它拥有完整的响应:)
The http protocol contains mechanisms so the client/server can determine message boundaries.
For uploaded data (POST, PUT) the content-length request header or chunked encoding is required. The content-length lets the server know exactly how many bytes to receive from the socket. Once those bytes have been received it'll then send in the other direction. That's effectively the message boundary here. Chunked-encoding also tell the server how many bytes; just in several pieces.
For the response, the content-length (or chunked encoding) optional. That also tells the client how many bytes to expect; this is required for persistent connections to work. If the content-length can't be determined the server simply closes the socket, then the client knows it has the whole response :)