“或”的理论表达大小是否有限制? Regex.Replace 上的运算符

发布于 2024-12-06 07:09:15 字数 162 浏览 1 评论 0原文

Regex.Replace 上的“或”运算符是否有理论表达式大小限制 例如 Regex.Replace("abc","(a|c|d|e...这里继续说 500000 个元素)","zzz") ?

.NET 实现上有任何 stackoverflowException 吗?

谢谢

Is there a theorical expression size limit for "or" operator on Regex.Replace
such as Regex.Replace("abc","(a|c|d|e...continue say 500000 elements here)","zzz") ?

Any stackoverflowException on .NET's implementation ?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烟花易冷人易散 2024-12-13 07:09:16

尽管每个正则表达式引擎都有自己的实现限制,但理论上没有限制。在这种情况下,由于您使用的是 .NET,因此限制是由于 .NET 运行时可以使用的内存量造成的。

具有一百万次交替的正则表达式对我来说效果很好:

string input = "a<142>c";
var options = Enumerable.Range(0, 1000000).Select(x => "<" + x + ">");
string pattern = string.Join("|", options);
string result = Regex.Replace(input, pattern, "zzz");

结果:

azzzc

但速度非常慢。将选项数量增加到 1000 万个会产生 OutOfMemoryException

研究另一种方法可能会让您受益匪浅。

There is no theoretical limit, though each regular expression engine will have its own implementation limits. In this case, since you are using .NET the limit is due to the amount of memory the .NET runtime can use.

A regular expression with one million alernations works fine for me:

string input = "a<142>c";
var options = Enumerable.Range(0, 1000000).Select(x => "<" + x + ">");
string pattern = string.Join("|", options);
string result = Regex.Replace(input, pattern, "zzz");

Result:

azzzc

It's very slow though. Increasing the number of options to 10 million gives me an OutOfMemoryException.

You probably would benefit from looking at another approach.

夏雨凉 2024-12-13 07:09:16

正则表达式的工作方式意味着,所描述的简单 a|b|c.....|x|y|z 表达式的内存要求和性能并不算太差,即使对于非常复杂的情况也是如此。大量的变体。

但是,如果您的表达式比这稍微复杂一点,则可能会导致表达式以指数方式损失性能,并大幅增加其内存占用量,因为大量这样的 选项可能会导致如果表达式的其他部分不立即匹配,它必须进行大量的回溯。

因此,您在做此类事情时可能需要谨慎行事。即使它现在起作用了,只需要一个小的、相对无害的改变就能让整个事情戛然而止。

The way regular expressions work mean that the memory requirements and performance for a simple a|b|c.....|x|y|z expression as described are not too bad, even for a very large number of variants.

However, if your expression is even slightly more complex than that, you could cause the expression to lose performance exponentially, as well as massively growing its memory footprint, as an large number of or options like this can cause it to have to do massive amounts of backtracking if other parts of the expression don't match immediately.

You may therefore want to excersise caution either doing this sort of thing. Even if it works now, it would only take a small and relatively innocent change to make the whole thing come to a grinding halt.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文