在 JavaScript 中编写非阻塞 for 循环最简洁的方法是什么?

发布于 2024-12-14 02:19:59 字数 728 浏览 2 评论 0原文

所以,我一直在考虑一个脑筋急转弯 - 如果我有一个大对象,由于某种原因我必须在 Node js 中进行迭代,并且不想在这样做时阻止事件循环,该怎么办?

这是一个我脑海中浮现的例子,我确信它可以更清晰:

var forin = function(obj,callback){
    var keys = Object.keys(obj),
        index = 0,
        interval = setInterval(function(){
            if(index < keys.length){
                callback(keys[index],obj[keys[index]],obj);
            } else {
                clearInterval(interval);
            }
            index ++;
        },0);
}

虽然我确信还有其他原因导致它混乱,但这将比常规 for 循环执行得更慢,因为setInterval 0 实际上并不是每 0 毫秒执行一次,但我不确定如何使用更快的 process.nextTick 进行循环。

在我的测试中,我发现这个示例需要 7 毫秒才能运行,而本机 for 循环(使用 hasOwnProperty() 检查,记录相同的信息)则需要 4 毫秒。

那么,使用 Node.js 编写相同代码的最干净/最快的方法是什么?

So, I've been thinking about a brain teaser - what if I had a large object I for some reason had to iterate through in node js, and didn't want to block the event loop while I was doing that?

Here's an off-the-top-of-my-head example, I'm sure it can be much cleaner:

var forin = function(obj,callback){
    var keys = Object.keys(obj),
        index = 0,
        interval = setInterval(function(){
            if(index < keys.length){
                callback(keys[index],obj[keys[index]],obj);
            } else {
                clearInterval(interval);
            }
            index ++;
        },0);
}

While I'm sure there are other reasons for it being messy, this will execute slower than a regular for loop, because setInterval 0 doesn't actually execute every 0 ms, but I'm not sure how to make a loop with the much faster process.nextTick.

In my tests, I found this example takes 7 ms to run, as opposed to a native for loop (with hasOwnProperty() checks, logging the same info), which takes 4 ms.

So, what's the cleanest/fastest way to write this same code using node.js?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

晨曦÷微暖 2024-12-21 02:20:00

自提出问题以来,process.nextTick 的行为已发生变化。之前的答案也没有按照功能的简洁性和效率来遵循问题。

// in node 0.9.0, process.nextTick fired before IO events, but setImmediate did
// not yet exist. before 0.9.0, process.nextTick between IO events, and after
// 0.9.0 it fired before IO events. if setImmediate and process.nextTick are
// both missing fall back to the tick shim.
var tick =
  (root.process && process.versions && process.versions.node === '0.9.0') ?
  tickShim :
  (root.setImmediate || (root.process && process.nextTick) || tickShim);

function tickShim(fn) {setTimeout(fn, 1);}

// executes the iter function for the first object key immediately, can be
// tweaked to instead defer immediately
function asyncForEach(object, iter) {
  var keys = Object.keys(object), offset = 0;

  (function next() {
    // invoke the iterator function
    iter.call(object, keys[offset], object[keys[offset]], object);

    if (++offset < keys.length) {
      tick(next);
    }
  })();
}

请注意关于 Kue 和正确作业队列的 @alessioalex 的评论

另请参阅:share-time,我编写的一个模块,用于执行类似于原来的问题。

The behavior of process.nextTick has changed since the question was asked. The previous answers also did not follow the question as per the cleanliness and efficiency of the function.

// in node 0.9.0, process.nextTick fired before IO events, but setImmediate did
// not yet exist. before 0.9.0, process.nextTick between IO events, and after
// 0.9.0 it fired before IO events. if setImmediate and process.nextTick are
// both missing fall back to the tick shim.
var tick =
  (root.process && process.versions && process.versions.node === '0.9.0') ?
  tickShim :
  (root.setImmediate || (root.process && process.nextTick) || tickShim);

function tickShim(fn) {setTimeout(fn, 1);}

// executes the iter function for the first object key immediately, can be
// tweaked to instead defer immediately
function asyncForEach(object, iter) {
  var keys = Object.keys(object), offset = 0;

  (function next() {
    // invoke the iterator function
    iter.call(object, keys[offset], object[keys[offset]], object);

    if (++offset < keys.length) {
      tick(next);
    }
  })();
}

Do take note of @alessioalex's comments regarding Kue and proper job queueing.

See also: share-time, a module I wrote to do something similar to the intent of the original question.

牵强ㄟ 2024-12-21 02:20:00

这里有很多话要说。

  • 例如,如果您有一个 Web 应用程序,您不会希望在该应用程序的进程中执行“繁重的工作”。即使您的算法很有效,它仍然很可能会减慢应用程序的速度。
  • 根据您想要实现的目标,您可能会使用以下方法之一:

    a)将“for in”循环放入子进程中,并在结束后在主应用程序中获取结果
    b) 如果你想实现延迟工作(例如发送电子邮件)之类的目标,你应该尝试 https://github。 com/LearnBoost/kue
    c) 使用 Redis 创建一个类似 Kue 的程序,在主应用程序和“繁重”应用程序之间进行通信。

对于这些方法,您还可以使用多个进程(用于并发)。

现在是示例代码了(它可能并不完美,所以如果您有更好的建议请纠正我):

var forIn, obj;

// the "for in" loop
forIn = function(obj, callback){
  var keys = Object.keys(obj);
  (function iterate(keys) {
    process.nextTick(function () {
      callback(keys[0], obj[keys[0]]);
      return ((keys = keys.slice(1)).length && iterate(keys));
    });
  })(keys);
};

// example usage of forIn
// console.log the key-val pair in the callback
function start_processing_the_big_object(my_object) {
  forIn(my_object, function (key, val) { console.log("key: %s; val: %s;", key, val); });
}

// Let's simulate a big object here
// and call the function above once the object is created
obj = {};
(function test(obj, i) {
  obj[i--] = "blah_blah_" + i;
  if (!i) { start_processing_the_big_object(obj); }
  return (i && process.nextTick(function() { test(obj, i); }));
})(obj, 30000);

There are many things to be said here.

  • If you have a web application for example, you wouldn't want to do "heavy lifting" in that application's process. Even though your algorithm is efficient, it would still most probably slow down the app.
  • Depending on what you're trying to achieve, you would probably use one of the following approaches:

    a) put your "for in" loop in a child process and get the result in your main app once it's over
    b) if you are trying to achieve something like delayed jobs (for ex sending emails) you should try https://github.com/LearnBoost/kue
    c) make a Kue-like program of your own using Redis to communicate between the main app and the "heavy lifting" app.

For these approaches you could also use multiple processes (for concurrency).

Now time for a sample code (it may not be perfect, so if you have a better suggestion please correct me):

var forIn, obj;

// the "for in" loop
forIn = function(obj, callback){
  var keys = Object.keys(obj);
  (function iterate(keys) {
    process.nextTick(function () {
      callback(keys[0], obj[keys[0]]);
      return ((keys = keys.slice(1)).length && iterate(keys));
    });
  })(keys);
};

// example usage of forIn
// console.log the key-val pair in the callback
function start_processing_the_big_object(my_object) {
  forIn(my_object, function (key, val) { console.log("key: %s; val: %s;", key, val); });
}

// Let's simulate a big object here
// and call the function above once the object is created
obj = {};
(function test(obj, i) {
  obj[i--] = "blah_blah_" + i;
  if (!i) { start_processing_the_big_object(obj); }
  return (i && process.nextTick(function() { test(obj, i); }));
})(obj, 30000);
好听的两个字的网名 2024-12-21 02:20:00

而不是:

for (var i=0; i<len; i++) {
  doSomething(i);
  }

做这样的事情:

var i = 0, limit;
while (i < len) {
  limit = (i+100);
  if (limit > len)
    limit = len;
  process.nextTick(function(){
     for (; i<limit; i++) {
      doSomething(i);
     }
    });
  }
}

这将运行 100 次循环迭代,然后将控制权暂时返回给系统,然后从中断处继续,直到完成。

编辑:这里它适合您的特定情况(并且它一次执行的迭代次数作为参数传入):

var forin = function(obj, callback, numPerChunk){
  var keys = Object.keys(obj);
  var len = keys.length;
  var i = 0, limit;
  while (i < len) {
    limit = i + numPerChunk;
    if (limit > len)
      limit = len;
    process.nextTick(function(){
        for (; i<limit; i++) {
          callback(keys[i], obj[keys[i]], obj);
        }
      });
  }
}

Instead of:

for (var i=0; i<len; i++) {
  doSomething(i);
  }

do something like this:

var i = 0, limit;
while (i < len) {
  limit = (i+100);
  if (limit > len)
    limit = len;
  process.nextTick(function(){
     for (; i<limit; i++) {
      doSomething(i);
     }
    });
  }
}

This will run 100 iterations of the loop, then return control to the system for a moment, then pick up where it left off, till its done.

Edit: here it is adapted for your particular case (and with the number of iterations it performs at a time passed in as an argument):

var forin = function(obj, callback, numPerChunk){
  var keys = Object.keys(obj);
  var len = keys.length;
  var i = 0, limit;
  while (i < len) {
    limit = i + numPerChunk;
    if (limit > len)
      limit = len;
    process.nextTick(function(){
        for (; i<limit; i++) {
          callback(keys[i], obj[keys[i]], obj);
        }
      });
  }
}
別甾虛僞 2024-12-21 02:20:00

以下内容适用于[浏览器] JavaScript;它可能与 Node.js 完全无关。


我知道的两个选项:

  1. 使用多个计时器来处理队列。它们将交错,这将产生“更频繁地处理项目”的净效果(这也是窃取更多 CPU 的好方法;-),或者,
  2. 每个周期做更多的工作,无论是基于计数还是基于时间。

我不确定 Web Workers 是否适用/可用。

快乐编码。

The following applies to [browser] JavaScript; it may be entirely irrelevant to node.js.


Two options I know of:

  1. Use multiple timers to process the queue. They will interleave which will give the net effect of "processing items more often" (this is also a good way to steal more CPU ;-), or,
  2. Do more work per cycle, either count or time based.

I am not sure if Web Workers are applicable/available.

Happy coding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文