大型迭代器超出了最大调用堆栈大小
我正在尝试将迭代器转换为数组。迭代器是在很长的字符串上调用 agetall
的结果。迭代器(我假设)在字符串中有许多匹配。首先,我尝试使用传播操作员尝试:
const array = [...myLongString.matchAll(/myregex/g)];
这给了我错误: grangeRor:最大呼叫堆栈大小超过
,
因此我尝试通过 next()
>:
const safeIteratorToArray = (iterator) => {
const result = [];
let item = iterator.next();
while (!item.done) {
result.push(item.value);
item = iterator.next();
}
return result;
};
但这给了我同样的错误,在 item = iterator.next()
line上。因此,我尝试将其异步用于重置呼叫堆栈:
const safeIteratorToArray = async (iterator) => {
const result = [];
let item = iterator.next();
while (!item.done) {
result.push(item.value);
item = await new Promise(resolve => setTimeout(() => resolve(iterator.next())));
}
return result;
};
但是我仍然遇到同样的错误。
如果您对实际用例感到好奇:
我实际使用的正则是::
/\[(.+?)\] \[DEBUG\] \[Item (.+?)\] Success with response: ((.|\n)+?)\n\[/g
文本文件的内容(是日志文件)通常看起来像:
[TIMESTAMP] [LOG_LEVEL] [Item ITEM_ID] Success with response: {
...put a giant json object here
}
重复每个日志之间的newlines。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
(V8 developer here.)
It's not about the iterator, it's about the RegExp.
[update]
看来我在测试中被错别字误导了,所以我的早期解释/建议无法解决问题。通过纠正了测试,事实证明, 表达式的末端(我以前称为“钓鱼”)需要固定。
堆栈内存的大量消耗是由于
(。| \ n)
是一个捕获组而引起的,并且非常频繁地匹配。一个想法是将其写成[。\ n]
而不是。
metacharacter在[...]套。
@cachius提示建议一种优雅的解决方案:使用 s flag 进行
。
匹配\ n
targun。作为一个无关的修复程序,更喜欢检查关闭
}
而不是在行开始时的下一个开放 [ [,以便在匹配的范围之间没有重叠你错过了一些比赛)。因此,总而言之,用
替换
((。| \ n)+?)\ n \ [/g
用(。+?)\ n}/gs
。[/Update]
Here is a reproducible example. The following exhibits the stack overflow:
If you replace the
const re = ...
line with:then the stack overflow disappears.
(It would be possible to reduce the simplified example even further, but then the connection with your original case wouldn't be as obvious.)
[Original post below -- the mechanism I explained here is factually correct, and applying the suggested replacement indeed将绩效提高了25%,因为它使得正式更简单匹配,这还不足以修复堆栈溢出。]
The problematic pattern is:
which, after all, means "a
[
, then any number of arbitrary characters, then a]
".虽然我知道正则表达方式可能看起来像魔术,但它们实际上是在引擎盖下的真正算法工作,有点像微型程序。特别是,任何时候在字符串中遇到]
时,该算法必须决定是否将其算作“任意字符”之一,还是一个]
结束了这个序列。由于它不能神奇地知道,因此必须将这两种可能性“牢记”(=堆栈)牢记,如果结果不正确,请选择一个和回溯。由于此回溯信息保存在堆栈上(还有其他地方?),如果您将足够多的]
放入字符串中,则堆栈将耗尽空间。幸运的是,解决方案很简单:由于您实际上的意思是“ A
[
,然后任何数量的字符]
,然后a]
", you can just tell the RegExp engine that, replacing.
with[^\]]
:Note:
(( 。我不确定为什么;这可能是由于我如何创建测试所致。如果您看到真实意见的进一步问题,那么也可能值得重新制定这一部分。
[/原始帖子]
(V8 developer here.)
It's not about the iterator, it's about the RegExp.
[Update]
Looks like I was misled by a typo in my test, so my earlier explanation/suggestion doesn't fix the problem. With the test corrected, it turns out that only the end of the expression (which I called "fishy" before) needs to be fixed.
The massive consumption of stack memory is caused by the fact that
(.|\n)
is a capture group, and is matched very frequently. One idea would be to write it as[.\n]
instead, but the.
metacharacter is not valid inside[...]
character sets.Hat tip to @cachius for suggesting an elegant solution: use the
s
flag to make.
match\n
characters.As an unrelated fix, prefer checking for the closing
}
instead of the next opening[
at the beginning of a line, so that there's no overlap between matched ranges (which would make you miss some matches).So, in summary, replace
((.|\n)+?)\n\[/g
with(.+?)\n}/gs
.[/Update]
Here is a reproducible example. The following exhibits the stack overflow:
If you replace the
const re = ...
line with:then the stack overflow disappears.
(It would be possible to reduce the simplified example even further, but then the connection with your original case wouldn't be as obvious.)
[Original post below -- the mechanism I explained here is factually correct, and applying the suggested replacement indeed improves performance by 25% because it makes the RegExp simpler to match, it just isn't enough to fix the stack overflow.]
The problematic pattern is:
which, after all, means "a
[
, then any number of arbitrary characters, then a]
". While I understand that regular expressions might seem like magic, they're actually real algorithmic work under the hood, kind of like miniature programs in their own right. In particular, any time a]
is encountered in the string, the algorithm has to decide whether to count this as one of the "arbitrary characters", or as the one]
that ends this sequence. Since it can't magically know that, it has to keep both possibilities "in mind" (=on the stack), pick one, and backtrack if that turns out to be incorrect. Since this backtracking information is kept on the stack (where else?), if you put sufficiently many]
into your string, the stack will run out of space.Luckily, the solution is simple: since what you actually mean is "a
[
, then any number of characters that aren't]
, then a]
", you can just tell the RegExp engine that, replacing.
with[^\]]
:Note:
((.|\n)+?)\n\[
seems fishy for the same reason, but according to this test doesn't appear to be the problem, even if I further increase the input size. I'm not sure why; it might be due to how I created the test. If you see further problems with the real input, it may be worth reformulating this part as well.[/Original post]