如何在 Node.JS 中限制(或排队)对外部进程的调用?

发布于 2024-12-02 12:20:35 字数 1589 浏览 1 评论 0原文

场景

我有一个 Node.JS 服务(使用 ExpressJS 编写),它接受通过 DnD 上传的图像(示例)。上传图像后,我对其执行一些操作:

  1. 从中提取 EXIF 数据
  2. 调整其大小

这些调用是通过 node-imagemagick 模块,我的代码看起来像这样:

app.post('/upload', function(req, res){
  ... <stuff here> ....

  im.readMetadata('./upload/image.jpg', function(err, meta) {
      // handle EXIF data.
  });

  im.resize(..., function(err, stdout, stderr) {
      // handle resize.
  });
});

问题

正如你们中的一些人已经发现的,问题是,如果我获得足够的同时上传,每个其中一个上传将产生一个“身份”调用,然后进行调整大小操作(来自 Image Magick),从而有效地在高负载下杀死服务器。

只需使用 ab -c 100 -n 100 进行测试就会锁定我的小型 512 Linode 开发服务器,这样我就必须强制重新启动。我知道我的测试可能对服务器来说负载太大,但我想要一种更强大的方法来处理这些请求,这样我就会有一个比完全虚拟机自杀更优雅的失败。

在Java中,我 解决了这个问题 通过创建一个固定线程 ExecutorService 将工作排队并在最多 X 个线程上执行它。

在 Node.JS 中,我什至不知道从哪里开始解决这样的问题。我不太了解非线程的性质以及如何创建一个异步 JavaScript 函数来对工作进行排队,而另一个...(线程?)处理队列。

任何关于如何思考这个问题或如何解决这个问题的指示将不胜感激。

附录

这与有关FFMpeg的问题,尽管我想当他的 web 应用程序处于负载状态时,这个人就会遇到同样的问题,因为它归结为同样的问题(并行触发太多同步本机进程)。

Scenario

I have a Node.JS service (written using ExpressJS) that accepts image uploads via DnD (example). After an image is uploaded, I do a few things to it:

  1. Pull EXIF data from it
  2. Resize it

These calls are being handled via the node-imagemagick module at the moment and my code looks something like this:

app.post('/upload', function(req, res){
  ... <stuff here> ....

  im.readMetadata('./upload/image.jpg', function(err, meta) {
      // handle EXIF data.
  });

  im.resize(..., function(err, stdout, stderr) {
      // handle resize.
  });
});

Question

As some of you already spotted, the problem is that if I get enough simultaneous uploads, every single one of those uploads will spawn an 'identity' call then a resize operation (from Image Magick), effectively killing the server under high load.

Just testing with ab -c 100 -n 100 locks my little 512 Linode dev server up such that I have to force a reboot. I understand that my test may just be too much load for the server, but I would like a more robust approach to processing these requests so I have a more graceful failure then total VM suicide.

In Java I solved this issue by creating a fixed-thread ExecutorService that queues up the work and executes it on at most X number of threads.

In Node.JS, I am not even sure where to start to solve a problem like this. I don't quite have my brain wrapped around the non-threaded nature and how I can create a async JavaScript function that queues up the work while another... (thread?) processes the queue.

Any pointers on how to think about this or how to approach this would be appreciated.

Addendum

This is not the same as this question about FFMpeg, although I imagine that person will have this exact same question as soon as his webapp is under load as it boils down to the same problem (firing off too many simultaneous native processes in parallel).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

魄砕の薆 2024-12-09 12:20:35

线程模块应该正是您所需要的:

https://github.com/robtweed/threads

The threads module should be just what you need:

https://github.com/robtweed/threads

那请放手 2024-12-09 12:20:35

由于 Node 不允许线程,因此您可以在另一个进程中执行工作。您可以使用后台作业系统,例如 resque,您可以在其中将要处理的作业排队到某种类型的数据存储,然后运行一个(或多个进程)从数据存储中提取作业并进行处理;或者使用类似 node-worker 的东西并将你的作业排队到工作人员中记忆。无论哪种方式,您的主应用程序都可以从执行所有处理中解放出来,并可以专注于服务 Web 请求。

[更新] 另一个有趣的库是 hook.io,特别是如果你喜欢这个想法节点工作人员但想要运行多个后台进程。 [/更新]

[编辑]

这是一个快速而肮脏的示例,使用 node-worker 将需要一段时间才能运行的工作推送到工作进程;工作线程对作业进行排队并逐一处理它们:

app.js

var Worker = require('worker').Worker;
var processor = new Worker('image_processor.js');

for(var i = 0; i <= 100; i++) {
  console.log("adding a new job");
  processor.postMessage({job: i});
}

processor.onmessage = function(msg) {
  console.log("worker done with job " + msg.job);
  console.log("result is " + msg.data.result);
};

image_processor.js

var worker = require('worker').worker;
var queue = [];

worker.onmessage = function(msg) {
  var job = msg.job;
  queue.push(job);
}

var process_job = function() {
  if(queue.length == 0) {
    setTimeout(process_job, 100);
    return;
  }

  var job = queue.shift();
  var data = {};

  data.result = job * 10;

  setTimeout(function() {
    worker.postMessage({job: job, data: data});
    process_job();
  }, 1000);
};

process_job();

Since Node does not allow threading, you can do work in another process. You can use a background job system, like resque, where you queue up jobs to be handled into a datastore of some type and then run a process (or several processes) that pulls jobs from the datastore and does the processing; or use something like node-worker and queue your jobs into the workers memory. Either way, your main application is freed up from doing all the processing and can focus on serving web requests.

[Update] Another interesting library to check out is hook.io, especially if you like the idea of node-workers but want to run multiple background processes. [/Update]

[Edit]

Here's a quick and dirty example of pushing work that takes a while to run to a worker process using node-worker; the worker queues jobs and processes them one by one:

app.js

var Worker = require('worker').Worker;
var processor = new Worker('image_processor.js');

for(var i = 0; i <= 100; i++) {
  console.log("adding a new job");
  processor.postMessage({job: i});
}

processor.onmessage = function(msg) {
  console.log("worker done with job " + msg.job);
  console.log("result is " + msg.data.result);
};

image_processor.js

var worker = require('worker').worker;
var queue = [];

worker.onmessage = function(msg) {
  var job = msg.job;
  queue.push(job);
}

var process_job = function() {
  if(queue.length == 0) {
    setTimeout(process_job, 100);
    return;
  }

  var job = queue.shift();
  var data = {};

  data.result = job * 10;

  setTimeout(function() {
    worker.postMessage({job: job, data: data});
    process_job();
  }, 1000);
};

process_job();
[旋木] 2024-12-09 12:20:35

对于那些认为布兰登的快速而肮脏的人可能快速而肮脏的人来说,这里有一个变体,它不再是并且没有不必要的忙碌等待。我无法测试它,但它应该有效。

var enqueue = function() {
  var queue = [];
  var execImmediate = function(fImmediate) {
    enqueue = function(fDelayed) 
      queue.push(fDelayed);
    };
    fImmediate();

    var ic = setInterval(function() {
      var fQueued = queue.shift();
      if (fQueued) {
        fQueued();
      } else {
        clearInterval(ic);
        enqueue = execImmediate;
      }
    }, 1000);
  };
  return execImmediate;
}();

For anyone who thought Brandon's quick-and-dirty might be too quick-and-dirty, here's a variation that is no longer and doesn't have the unnecessary busy-wait. I'm not in a position to test it but it should work.

var enqueue = function() {
  var queue = [];
  var execImmediate = function(fImmediate) {
    enqueue = function(fDelayed) 
      queue.push(fDelayed);
    };
    fImmediate();

    var ic = setInterval(function() {
      var fQueued = queue.shift();
      if (fQueued) {
        fQueued();
      } else {
        clearInterval(ic);
        enqueue = execImmediate;
      }
    }, 1000);
  };
  return execImmediate;
}();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文