Node.js 和 CPU 密集型请求

发布于 2024-09-14 04:17:17 字数 560 浏览 15 评论 0原文

我已经开始修补 Node.js HTTP 服务器，并且非常喜欢编写服务器端 Javascript，但有些事情阻止我开始在我的 Web 应用程序中使用 Node.js。

我了解整个异步 I/O 概念，但我有点担心程序代码非常占用 CPU 资源的边缘情况，例如图像处理或对大型数据集进行排序。

据我了解，服务器对于简单的网页请求（例如查看用户列表或查看博客文章）将非常快。但是，如果我想编写 CPU 密集型代码（例如在管理后端）来生成图形或调整数千个图像的大小，则请求将非常慢（几秒钟）。由于此代码不是异步的，因此在这几秒钟内到达服务器的每个请求都将被阻止，直到我的缓慢请求完成。

一项建议是使用 Web Workers 来执行 CPU 密集型任务。然而，我担心网络工作者会很难编写干净的代码，因为它是通过包含单独的 JS 文件来工作的。如果 CPU 密集型代码位于对象的方法中怎么办？为每个 CPU 密集型方法都编写一个 JS 文件有点糟糕。

另一个建议是生成一个子进程，但这使得代码更难以维护。

有什么建议可以克服这个（感知到的）障碍吗？如何使用 Node.js 编写干净的面向对象代码，同时确保异步执行 CPU 繁重的任务？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谜兔 2024-09-21 04:17:17

这是对 Web 服务器定义的误解——它只能用于与客户端“交谈”。重负载任务应该委托给独立程序（当然也可以用 JS 编写）。
您可能会说它很脏，但我向您保证，陷入调整图像大小的 Web 服务器进程只会更糟（即使对于 Apache，当它不阻止其他查询时也是如此）。尽管如此，您仍然可以使用公共库来避免代码冗余。

编辑：我想出了一个类比；网络应用程序应该像一家餐厅。你有服务员（网络服务器）和厨师（工人）。服务员与顾客联系并执行简单的任务，例如提供菜单或解释某些菜肴是否是素食。另一方面，他们将更艰巨的任务委托给厨房。因为服务员只做简单的事情，所以他们反应很快，厨师也可以专注于他们的工作。

这里的 Node.js 将是一个单一但非常有才华的服务员，可以一次处理许多请求，而 Apache 将是一群愚蠢的服务员，每个服务员只能处理一个请求。如果这个 Node.js 服务员开始做饭，那将立即引发一场灾难。尽管如此，做饭甚至可能会耗尽大量的阿帕奇服务员，更不用说厨房里的混乱和响应能力的逐渐下降。

回复收藏 0 原文

白日梦 2024-09-21 04:17:17

你需要的是一个任务队列！将长时间运行的任务移出网络服务器是一件好事。将每个任务保存在“单独的”js 文件中可以促进模块化和代码重用。它迫使您思考如何构建程序，以便从长远来看更容易调试和维护。任务队列的另一个好处是工作人员可以用不同的语言编写。只需弹出一个任务，完成工作，然后写回响应即可。

像这样的 https://github.com/resque/resque

这是 github 上的一篇文章，介绍了为什么他们构建它 http://github.com/blog/542-introducing-resque

回复收藏 0 原文

戏剧牡丹亭 2024-09-21 04:17:17

您不希望 CPU 密集型代码异步执行，您希望它并行执行。您需要从服务 HTTP 请求的线程中取出处理工作。这是解决这个问题的唯一方法。对于 NodeJS，答案是集群模块，用于生成子进程来完成繁重的工作。（AFAIK Node 没有任何线程/共享内存的概念；它是进程或什么都没有）。对于如何构建应用程序，您有两种选择。您可以通过生成 8 个 HTTP 服务器并在子进程上同步处理计算密集型任务来获得 80/20 解决方案。这样做相当简单。您可能需要一个小时才能在该链接上阅读相关内容。事实上，如果您直接抄袭该链接顶部的示例代码，您将完成 95% 的工作。

构建此结构的另一种方法是设置作业队列并通过队列发送大型计算任务。请注意，作业队列的 IPC 会产生大量开销，因此仅当任务明显大于开销时这才有用。

令我惊讶的是，这些其他答案都没有提及集群。

背景：
异步代码是暂停的代码，直到其他地方发生某些事情，此时代码被唤醒并继续执行。一种非常常见的情况是，I/O 必定会在其他地方发生缓慢的情况。

如果您的处理器负责完成工作，异步代码就没有用处。 “计算密集型”任务正是如此。

现在，异步代码看起来似乎很小众，但实际上它很常见。它恰好对计算密集型任务没有用处。

例如，等待 I/O 是 Web 服务器中经常发生的一种模式。每个连接到您的服务器的客户端都会获得一个套接字。大多数时候套接字都是空的。在套接字接收到一些数据之前您不想执行任何操作，此时您想要处理请求。在底层，像 Node 这样的 HTTP 服务器正在使用事件库 (libev) 来跟踪数千个打开的套接字。操作系统通知 libev，然后当其中一个套接字获取数据时，libev 通知 NodeJS，然后 NodeJS 将一个事件放入事件队列中，此时您的 http 代码就会启动并逐个处理事件。在套接字有一些数据之前，事件不会被放入队列中，因此事件永远不会等待数据 - 数据已经在那里等待它们。

当瓶颈正在等待一堆大部分为空的套接字连接并且您不希望每个空闲连接都有一个完整的线程或进程并且您不想轮询 250k 时，基于事件的单线程 Web 服务器作为一种范例是有意义的套接字来查找下一个有数据的套接字。

You don't want your CPU intensive code to execute async, you want it to execute in parallel. You need to get the processing work out of the thread that's serving HTTP requests. It's the only way to solve this problem. With NodeJS the answer is the cluster module, for spawning child processes to do the heavy lifting. (AFAIK Node doesn't have any concept of threads/shared memory; it's processes or nothing). You have two options for how you structure your application. You can get the 80/20 solution by spawning 8 HTTP servers and handling compute-intensive tasks synchronously on the child processes. Doing that is fairly simple. You could take an hour to read about it at that link. In fact, if you just rip off the example code at the top of that link you will get yourself 95% of the way there.

The other way to structure this is to set up a job queue and send big compute tasks over the queue. Note that there is a lot of overhead associated with the IPC for a job queue, so this is only useful when the tasks are appreciably larger than the overhead.

I'm surprised that none of these other answers even mention cluster.

Background:
Asynchronous code is code that suspends until something happens somewhere else, at which point the code wakes up and continues execution. One very common case where something slow must happen somewhere else is I/O.

Asynchronous code isn't useful if it's your processor that is responsible for doing the work. That is precisely the case with "compute intensive" tasks.

Now, it might seem that asynchronous code is niche, but in fact it's very common. It just happens not to be useful for compute intensive tasks.

Waiting on I/O is a pattern that always happens in web servers, for example. Every client who connects to your sever gets a socket. Most of the time the sockets are empty. You don't want to do anything until a socket receives some data, at which point you want to handle the request. Under the hood an HTTP server like Node is using an eventing library (libev) to keep track of the thousands of open sockets. The OS notifies libev, and then libev notifies NodeJS when one of the sockets gets data, and then NodeJS puts an event on the event queue, and your http code kicks in at this point and handles the events one after the other. Events don't get put on the queue until the socket has some data, so events are never waiting on data - it's already there for them.

Single threaded event-based web servers makes sense as a paradigm when the bottleneck is waiting on a bunch of mostly empty socket connections and you don't want a whole thread or process for every idle connection and you don't want to poll your 250k sockets to find the next one that has data on it.

回复收藏 0 原文