有时,能够限制正则表达式操作的模式匹配持续时间可能会很有用。特别是,当使用用户提供的模式来匹配数据时,由于嵌套量词和过多的回溯,该模式可能会表现出较差的性能(请参阅灾难性回溯)。应用超时的一种方法是异步运行正则表达式,但这可能很乏味并且会使代码混乱。
根据 .NET Framework 4.5 开发人员中的新增功能预览 看起来有一个新的内置方法来支持这一点:
能够限制正则表达式引擎尝试的时间长度
在超时之前解析正则表达式。
我该如何使用此功能?另外,使用时需要注意什么?
注意:我问并回答这个问题,因为这是受到鼓励的。
There are times when being able to limit the pattern matching duration of regex operations could be useful. In particular, when working with user supplied patterns to match data, the pattern might exhibit poor performance due to nested quantifiers and excessive back-tracking (see catastrophic backtracking). One way to apply a timeout is to run the regex asynchronously, but this can be tedious and clutters the code.
According to what's new in the .NET Framework 4.5 Developer Preview it looks like there's a new built-in approach to support this:
Ability to limit how long the regular expression engine will attempt
to resolve a regular expression before it times out.
How can I use this feature? Also, what do I need to be aware of when using it?
Note: I'm asking and answering this question since it's encouraged.
发布评论
评论(1)
我最近研究了这个主题,因为它让我感兴趣,并将在此介绍要点。相关的 MSDN 文档位于此处您可以查看
Regex
类来查看新的重载构造函数和静态方法。代码示例可以使用 Visual Studio 11 开发人员预览版< /a>.Regex
类接受TimeSpan< /code>
指定超时时间。您可以在应用程序中在宏观和微观级别上指定超时,并且它们可以一起使用:
AppDomain.SetData
method(宏观应用程序范围)matchTimeout
参数(微观本地范围)当设置
AppDomain
属性时,所有Regex
操作将使用该值作为默认超时。要覆盖应用程序范围的默认值,您只需将matchTimeout
值传递给正则表达式构造函数或静态方法即可。如果未设置AppDomain
默认值,并且未指定matchTimeout
,则模式匹配不会超时(即原始 .NET 4.5 之前的行为)。有 2 个主要异常需要处理:
RegexMatchTimeoutException
:超时时抛出。ArgumentOutOfRangeException
:当“matchTimeout
为负数或大于大约 24 天”时抛出。此外,TimeSpan
值为零将导致抛出此错误。尽管不允许使用负值,但有一个例外:接受 -1 毫秒的值。在内部,
Regex
类接受 -1 毫秒,这是Regex.InfiniteMatchTimeout
字段,指示匹配不应超时(即,.NET 4.5 之前的原始行为)。使用 matchTimeout 参数
在下面的示例中,我将演示有效和无效的超时场景以及如何处理它们:
使用
Regex
类的实例时,您可以访问MatchTimeout
属性:使用 AppDomain 属性
"REGEX_DEFAULT_MATCH_TIMEOUT"
属性用于设置应用程序范围的默认值:如果设置了此属性无效的
TimeSpan
值或无效的对象,TypeInitializationException
将在尝试使用时抛出一个正则表达式。具有有效属性值的示例:
将上面的示例与无效(负)值一起使用将导致引发异常。处理它的代码将以下消息写入控制台:
在这两个示例中,都不会抛出
ArgumentOutOfRangeException
。为了完整起见,代码显示了使用新的 .NET 4.5Regex
超时功能时可以处理的所有异常。覆盖 AppDomain 默认值
覆盖
AppDomain
默认值是通过指定matchTimeout
值来完成的。在下一个示例中,匹配将在 2 秒内超时,而不是默认的 5 秒。结束语
MSDN 建议在所有正则表达式模式匹配操作中设置超时值。但是,它们不会让您注意这样做时需要注意的问题。我不建议设置 AppDomain 默认值然后就到此为止。您需要了解您的输入并了解您的模式。如果输入很大,或者模式很复杂,则应使用适当的超时值。这可能还需要测量您关键执行的正则表达式的使用情况以分配合理的默认值。如果该值不够长,则为曾经工作正常的正则表达式任意分配一个超时值可能会导致它崩溃。如果您认为可能会过早中止匹配尝试,请在分配值之前测量现有的使用情况。
此外,此功能在处理用户提供的模式时非常有用。然而,学习如何编写性能良好的正确模式很重要。通过超时来弥补对正确模式构建知识的缺乏并不是一个好的做法。
I recently researched this topic since it interested me and will cover the main points here. The relevant MSDN documentation is available here and you can check out the
Regex
class to see the new overloaded constructors and static methods. The code samples can be run with Visual Studio 11 Developer Preview.The
Regex
class accepts aTimeSpan
to specify the timeout duration. You can specify a timeout at a macro and micro level in your application, and they can be used together:"REGEX_DEFAULT_MATCH_TIMEOUT"
property using theAppDomain.SetData
method (macro application-wide scope)matchTimeout
parameter (micro localized scope)When the
AppDomain
property is set, allRegex
operations will use that value as the default timeout. To override the application-wide default you simply pass amatchTimeout
value to the regex constructor or static method. If anAppDomain
default isn't set, andmatchTimeout
isn't specified, then pattern matching will not timeout (i.e., original pre-.NET 4.5 behavior).There are 2 main exceptions to handle:
RegexMatchTimeoutException
: thrown when a timeout occurs.ArgumentOutOfRangeException
: thrown when "matchTimeout
is negative or greater than approximately 24 days." In addition, aTimeSpan
value of zero will cause this to be thrown.Despite negative values not being allowed, there's one exception: a value of -1 ms is accepted. Internally the
Regex
class accepts -1 ms, which is the value of theRegex.InfiniteMatchTimeout
field, to indicate that a match should not timeout (i.e., original pre-.NET 4.5 behavior).Using the matchTimeout parameter
In the following example I'll demonstrate both valid and invalid timeout scenarios and how to handle them:
When using an instance of the
Regex
class you have access to theMatchTimeout
property:Using the AppDomain property
The
"REGEX_DEFAULT_MATCH_TIMEOUT"
property is used set an application-wide default:If this property is set to an invalid
TimeSpan
value or an invalid object, aTypeInitializationException
will be thrown when attempting to use a regex.Example with a valid property value:
Using the above example with an invalid (negative) value would cause the exception to be thrown. The code that handles it writes the following message to the console:
In both examples the
ArgumentOutOfRangeException
isn't thrown. For completeness the code shows all the exceptions you can handle when working with the new .NET 4.5Regex
timeout feature.Overriding AppDomain default
Overriding the
AppDomain
default is done by specifying amatchTimeout
value. In the next example the match times out in 2 seconds instead of the default of 5 seconds.Closing Remarks
MSDN recommends setting a time-out value in all regular expression pattern-matching operations. However, they don't draw your attention to issues to be aware of when doing so. I don't recommend setting an AppDomain default and calling it a day. You need to know your input and know your patterns. If the input is large, or the pattern is complex, an appropriate timeout value should be used. This might also entail measuring your critically performing regex usages to assign sane defaults. Arbitrarily assigning a timeout value to a regex that used to work fine may cause it to break if the value isn't long enough. Measure existing usages before assigning a value if you think it might abort the matching attempt too early.
Moreover, this feature is useful when handling user supplied patterns. Yet, learning how to write proper patterns that perform well is important. Slapping a timeout on it to make up for a lack of knowledge in proper pattern construction isn't good practice.