如何在 Erlang Web 服务器中处理非常大的文件上传?
所以,假设我正在编写一个网络服务器,并且我想支持“非常大”的文件上传。让我们进一步假设我的意思是通过标准的 multipart/form-data MIME 类型来做到这一点。我应该说我正在使用 erlang 并且我计划收集从 erlang:decode_packet/2 返回的 http 数据包,但我不想真正收集请求正文,直到 http 请求处理程序已找到放置上传内容的位置。我是否应该
a)继续收集主体,忽略它非常非常大的可能性,从而可能由于内存不足而导致服务器崩溃?
b) 在处理标头之前避免在套接字上接收任何(可能不存在的)请求主体?
c) 做点别的事?
答案 c 的一个示例可能是:生成另一个进程来收集上传的内容并将其写入临时位置(以最大限度地减少内存使用),同时将该位置提供给 http 请求处理程序以供将来处理。但我只是不知道 - 这里有标准技术吗?
So, lets say I'm writing a web server and I want to support "very large" file uploads. Lets further assume that I mean to do this via the standard multipart/form-data MIME type. I should say that I'm using erlang and that I plan to collect http packets as they are returned from erlang:decode_packet/2
, but I do not want to actually collect the request body until the http request handler has found place for the uploaded content to go. Should I
a) go-ahead and collect the body anyway, ignoring the possibility of its being very very large and thus possibly crashing the server due to its running out of memory?
b) refrain from receiving on the socket any (possibly non-existent) request body until after the headers have been processed?
c) do something else?
An example for answer c might be: spawn another process to collect and write the uploaded content to a temporary location (in order to minimize memory use), while simultaneously giving that location to the http request handler for future processing. But I just don't know - is there a standard technique here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在我看来,选项 b 显然是更好的选项。
在不读取套接字的时间段内,TCP 代码将继续在内核中缓冲传入数据。当它这样做时,它会向 HTTP 服务器通告越来越小的 TCP 窗口大小,直到最终(当内核中的 TCP 接收缓冲区已满时),TCP 窗口将关闭。
换句话说,通过不读取套接字,您可以让 TCP 流量控制完成其工作。
In my opinion option b is clearly the superior one.
During the period of time that you are not reading the socket, the TCP code will continue to buffer the incoming data within the kernel. As it does so, it will advertise a smaller and smaller TCP window size to the HTTP server, until eventually (when the TCP receive buffers in the kernel are full), the TCP window will close.
In other words, by not reading the socket, you are allowing TCP flow-control do its job.
在我的实现中,我使用您的示例作为答案 c - 我从套接字逐块读取并将块存储到临时文件。此外,afaik yaws 使用类似的技术 - 您可以在 yaws/src/yaws_multipart.erl 中看到它
In my implementation I uses your example for answer c - I read from socket chunk by chunk and store chunks to temporary file. Also, afaik yaws uses simillar technique - you can see it at yaws/src/yaws_multipart.erl
存储到临时文件也是 PHP 的工作方式,因此这是一种经过尝试和测试的方法。您可以计算收到的字节数,如果它达到没有意义的大小,则断开连接。
Storing to a temporary file is also the way PHP does things, so it's a tried and tested way. You could count the bytes received and disconnect if it reaches a size that makes no sense.