当前位置：文江博客话题详情

大型迭代器超出了最大调用堆栈大小

发布于 2025-01-18 08:13:33 字数 1256 浏览 5 评论 0 原文

我正在尝试将迭代器转换为数组。迭代器是在很长的字符串上调用 agetall 的结果。迭代器（我假设）在字符串中有许多匹配。首先，我尝试使用传播操作员尝试：

const array = [...myLongString.matchAll(/myregex/g)];

这给了我错误： grangeRor：最大呼叫堆栈大小超过，

因此我尝试通过 next（）>：

const safeIteratorToArray = (iterator) => {
    const result = [];
    let item = iterator.next();

    while (!item.done) {
        result.push(item.value);
        item = iterator.next();
    }

    return result;
};

但这给了我同样的错误，在 item = iterator.next（） line上。因此，我尝试将其异步用于重置呼叫堆栈：

const safeIteratorToArray = async (iterator) => {
    const result = [];
    let item = iterator.next();

    while (!item.done) {
        result.push(item.value);
        item = await new Promise(resolve => setTimeout(() => resolve(iterator.next())));
    }

    return result;
};

但是我仍然遇到同样的错误。

如果您对实际用例感到好奇：

我实际使用的正则是：：

 /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] Success with response: ((.|\n)+?)\n\[/g

文本文件的内容（是日志文件）通常看起来像：

[TIMESTAMP] [LOG_LEVEL] [Item ITEM_ID] Success with response: {
    ...put a giant json object here
}

重复每个日志之间的newlines。

原文

I'm trying to convert an iterator to an array. The iterator is the result of calling matchAll on a very long string. The iterator (I assume) has many matches within the string. First I tried it with the spread operator:

const array = [...myLongString.matchAll(/myregex/g)];

This gave me the error: RangeError: Maximum call stack size exceeded

So I tried iterating via next():

const safeIteratorToArray = (iterator) => {
    const result = [];
    let item = iterator.next();

    while (!item.done) {
        result.push(item.value);
        item = iterator.next();
    }

    return result;
};

But this gives me the same error, on the item = iterator.next() line. So I tried making it async in an effort to reset the call stack:

const safeIteratorToArray = async (iterator) => {
    const result = [];
    let item = iterator.next();

    while (!item.done) {
        result.push(item.value);
        item = await new Promise(resolve => setTimeout(() => resolve(iterator.next())));
    }

    return result;
};

But I still get the same error.

If you are curious about the actual use case:

The regex I'm actually using is:

 /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] Success with response: ((.|\n)+?)\n\[/g

And the contents of the text file (it's a log file) generally looks like:

[TIMESTAMP] [LOG_LEVEL] [Item ITEM_ID] Success with response: {
    ...put a giant json object here
}

Repeat that ad-nauseam with newlines in between each log.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋千易 2025-01-25 08:13:33

(V8 developer here.)

It's not about the iterator, it's about the RegExp.

[update]
看来我在测试中被错别字误导了，所以我的早期解释/建议无法解决问题。通过纠正了测试，事实证明，表达式的末端（我以前称为“钓鱼”）需要固定。

堆栈内存的大量消耗是由于（。| \ n）是一个捕获组而引起的，并且非常频繁地匹配。一个想法是将其写成 [。\ n] 而不是。 metacharacter在 [...]套。

@cachius提示建议一种优雅的解决方案：使用 s flag 进行。匹配 \ n targun。
作为一个无关的修复程序，更喜欢检查关闭} 而不是在行开始时的下一个开放 [ [，以便在匹配的范围之间没有重叠你错过了一些比赛）。
因此，总而言之，用替换（（。| \ n）+？）\ n \ [/g 用（。+？）\ n}/gs 。

[/Update]

Here is a reproducible example. The following exhibits the stack overflow:

let lines = ["[TIMESTAMP] [DEBUG] [Item ITEM_ID] {"];
for (let i = 0; i < 1000000; i++) {
  lines.push("  [0]");  // Fake JSON, close enough for RegExp purposes.
}
lines.push("}");
lines.push("[TIMESTAMP]");
let myLongString = lines.join("\n");

const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] ((.|\n)+?)\n\[/g;
myLongString.match(re);

If you replace the const re = ... line with:

const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] (.+?)\n}/gs;

then the stack overflow disappears.

(It would be possible to reduce the simplified example even further, but then the connection with your original case wouldn't be as obvious.)

[Original post below -- the mechanism I explained here is factually correct, and applying the suggested replacement indeed将绩效提高了25％，因为它使得正式更简单匹配，这还不足以修复堆栈溢出。]
The problematic pattern is:

\[(.+?)\]

which, after all, means "a [, then any number of arbitrary characters, then a ]".虽然我知道正则表达方式可能看起来像魔术，但它们实际上是在引擎盖下的真正算法工作，有点像微型程序。特别是，任何时候在字符串中遇到] 时，该算法必须决定是否将其算作“任意字符”之一，还是一个] 结束了这个序列。由于它不能神奇地知道，因此必须将这两种可能性“牢记”（=堆栈）牢记，如果结果不正确，请选择一个和回溯。由于此回溯信息保存在堆栈上（还有其他地方？），如果您将足够多的] 放入字符串中，则堆栈将耗尽空间。

幸运的是，解决方案很简单：由于您实际上的意思是“ A [，然后任何数量的字符 ] ，然后a ]", you can just tell the RegExp engine that, replacing . with [^\]]:

\[([^\]]+?)\]

Note: (( 。我不确定为什么；这可能是由于我如何创建测试所致。如果您看到真实意见的进一步问题，那么也可能值得重新制定这一部分。 [/原始帖子]

(V8 developer here.)

It's not about the iterator, it's about the RegExp.

[Update]
Looks like I was misled by a typo in my test, so my earlier explanation/suggestion doesn't fix the problem. With the test corrected, it turns out that only the end of the expression (which I called "fishy" before) needs to be fixed.

The massive consumption of stack memory is caused by the fact that (.|\n) is a capture group, and is matched very frequently. One idea would be to write it as [.\n] instead, but the . metacharacter is not valid inside [...] character sets.

Hat tip to @cachius for suggesting an elegant solution: use the s flag to make . match \n characters.
As an unrelated fix, prefer checking for the closing } instead of the next opening [ at the beginning of a line, so that there's no overlap between matched ranges (which would make you miss some matches).
So, in summary, replace ((.|\n)+?)\n\[/g with (.+?)\n}/gs.

[/Update]

Here is a reproducible example. The following exhibits the stack overflow:

let lines = ["[TIMESTAMP] [DEBUG] [Item ITEM_ID] {"];
for (let i = 0; i < 1000000; i++) {
  lines.push("  [0]");  // Fake JSON, close enough for RegExp purposes.
}
lines.push("}");
lines.push("[TIMESTAMP]");
let myLongString = lines.join("\n");

const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] ((.|\n)+?)\n\[/g;
myLongString.match(re);

If you replace the const re = ... line with:

const re = /\[(.+?)\] \[DEBUG\] \[Item (.+?)\] (.+?)\n}/gs;

then the stack overflow disappears.

(It would be possible to reduce the simplified example even further, but then the connection with your original case wouldn't be as obvious.)

[Original post below -- the mechanism I explained here is factually correct, and applying the suggested replacement indeed improves performance by 25% because it makes the RegExp simpler to match, it just isn't enough to fix the stack overflow.]
The problematic pattern is:

\[(.+?)\]

which, after all, means "a [, then any number of arbitrary characters, then a ]". While I understand that regular expressions might seem like magic, they're actually real algorithmic work under the hood, kind of like miniature programs in their own right. In particular, any time a ] is encountered in the string, the algorithm has to decide whether to count this as one of the "arbitrary characters", or as the one ] that ends this sequence. Since it can't magically know that, it has to keep both possibilities "in mind" (=on the stack), pick one, and backtrack if that turns out to be incorrect. Since this backtracking information is kept on the stack (where else?), if you put sufficiently many ] into your string, the stack will run out of space.

Luckily, the solution is simple: since what you actually mean is "a [, then any number of characters that aren't ], then a ]", you can just tell the RegExp engine that, replacing . with [^\]]:

\[([^\]]+?)\]

Note: ((.|\n)+?)\n\[ seems fishy for the same reason, but according to this test doesn't appear to be the problem, even if I further increase the input size. I'm not sure why; it might be due to how I created the test. If you see further problems with the real input, it may be worth reformulating this part as well.
[/Original post]

回复收藏 0 原文

~没有更多了~

关于作者

墟烟

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

大型迭代器超出了最大调用堆栈大小

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

大型迭代器超出了最大调用堆栈大小

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。