vb.net Regex 通过编译和共享变量提高性能

发布于 2024-12-07 18:37:44 字数 1896 浏览 0 评论 0原文

我正在编写一些使用固定正则表达式来搜索字符串和模式匹配的代码。

它很简单,但我想通过编译来提高正则表达式的性能(它是一个高流量的网站)。

我正在考虑编译正则表达式并将其放入类内的共享(静态)变量中。

像这样的东西:

Namespace Regexs

    Public Class UrlNickname

        Private Shared rgx As Regex = New Regex("^\/\w{4,20}$", RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant Or RegexOptions.Compiled)

        ''' <summary>
        ''' Returns a Nickname string if pattern found in Url, otherwise returns Empty string.
        ''' </summary>
        ''' <param name="url">The Url string to search.</param>
        ''' <returns></returns>
        ''' <remarks></remarks>
        Public Shared Function ContainsNickname(url As String) As String
            If rgx.IsMatch(url) Then
                Return url.Substring(1, url.Length - 1)
            End If
            Return String.Empty
        End Function

    End Class

End Namespace

然后你可以使用这样的函数:

Dim url As String = HttpContext.Current.Request.RawUrl

Dim nickname As String = Regexs.UrlNickname.ContainsNickname(url)
If Not String.IsNullOrEmpty(nickname) Then
    //nickname pattern match found:
    //do something like RedirectToRoutePermanent
End If

基本上,我将正则表达式存储在共享(静态)变量中,以便仅编译一次。

然后调用该函数来检查是否在 404 错误页面上找到了用户名模式匹配。

这是否是提高正则表达式性能的最佳方法?

注意:我对上面的 404 错误页面问题的解决方案不感兴趣,它只是一个简单的示例。


然后,进一步的改进可以使用正则表达式的共享通用列表,如下所示:

Private Shared _rgxList As List(Of Regex)

Public Sub New()
    //get list of regex expressions from database and populate:
    _rgxList.Add(New Regex("blah", RegexOptions.Compiled))
    _rgxList.Add(New Regex("blah2", RegexOptions.Compiled))

Public Shared Function IsMatch(str) as Boolean
    With each reg in _rgxList
        return reg.IsMatch(str)

New() 将在 Application.Start 事件上调用。

I am writing some code that uses fixed regexs to search strings and pattern match.

Its simple stuff, but I want to improve regex performance with compiling (its a high traffic website).

I was thinking of compiling the regex and putting it in a Shared (static) variable inside a class.

Something like this:

Namespace Regexs

    Public Class UrlNickname

        Private Shared rgx As Regex = New Regex("^\/\w{4,20}$", RegexOptions.IgnoreCase Or RegexOptions.CultureInvariant Or RegexOptions.Compiled)

        ''' <summary>
        ''' Returns a Nickname string if pattern found in Url, otherwise returns Empty string.
        ''' </summary>
        ''' <param name="url">The Url string to search.</param>
        ''' <returns></returns>
        ''' <remarks></remarks>
        Public Shared Function ContainsNickname(url As String) As String
            If rgx.IsMatch(url) Then
                Return url.Substring(1, url.Length - 1)
            End If
            Return String.Empty
        End Function

    End Class

End Namespace

Then you could use the function like this:

Dim url As String = HttpContext.Current.Request.RawUrl

Dim nickname As String = Regexs.UrlNickname.ContainsNickname(url)
If Not String.IsNullOrEmpty(nickname) Then
    //nickname pattern match found:
    //do something like RedirectToRoutePermanent
End If

Basically , I store the regex in a Shared (static) variable so that is only compiled once.

The function would then be called to check if a username pattern match was found on a 404 error page.

Whould this be the best approach for improving regex performance?

Note: I am not interested in a solution for 404 error page problems above, its just a simple example.


Further advances could then use a Shared generic list of regexs , like so:

Private Shared _rgxList As List(Of Regex)

Public Sub New()
    //get list of regex expressions from database and populate:
    _rgxList.Add(New Regex("blah", RegexOptions.Compiled))
    _rgxList.Add(New Regex("blah2", RegexOptions.Compiled))

Public Shared Function IsMatch(str) as Boolean
    With each reg in _rgxList
        return reg.IsMatch(str)

New() would be called on Application.Start event.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

生生不灭 2024-12-14 18:37:44

看起来不错。此外,我会将共享变量设置为只读,以避免意外更改。

但请注意,编译正则表达式在运行时可将性能提高约 30%,但它也有其缺点,特别是需要更多内存。可以在这里找到一个很好的比较:

因此,这里没有“A 比 B 更好”的普遍答案。这一切都取决于您的具体要求。您可能需要进行测量才能找出在您的特定情况下效果更好的方法。

关于性能调整的一般建议:在改进某些东西之前,请确保它确实是性能瓶颈。如果您的正则表达式花费 0.01 或 0.02 秒并不重要,例如,如果在数据库中查找昵称需要 2 秒。使用 .net Framework 中的内置工具(例如 Stopwatch Class< /a>)或外部工具(例如 EQATEC Profiler)来找出瓶颈所在。

Looks fine. In addition, I'd make the Shared variable ReadOnly, to avoid accidental changes.

Note, however, that compiling regular expressions increases performance by about 30% at runtime, but it also has its downsides, in particular, it requires more memory. A nice comparison can be found here:

So, there's no general "A is better than B" answer here. It all depends on your exact requirements. You might have to do measurements to find out what performs better in your particular case.

General advice on performance tuning: Make sure that something is really the performance bottleneck before improving it. It doesn't matter if your regex takes 0.01 or 0.02 seconds, if, for example, looking up the nickname in the databsae takes 2 seconds. Use built-in tools from the .net Framework (e.g. Stopwatch Class) or external tools (e.g. EQATEC Profiler) to find out where your bottlenecks are.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文