如何处理来自用户提交的正则表达式的无休止的匹配
让我们考虑 C# 中的以下两行(使用框架 .NET 3.5)
Regex regex = new Regex(@"^((E|e)t )?(M|m)oi (?<NewName>[A-Za-z]\.?\w*((\-|\s)?[A-Za-z]?\w{1,})+)$", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Match m = regex.Match("moi aussi jaimerai etre un ordinateur pour pas m'énnerver ");
(抱歉,这是一个法国程序:))
执行它们时,进程会卡在 Match()
方法中并且永远不会退出。 我猜正则表达式模式中的空格存在一些问题,但我想做的不是更改模式(实际上它是由我的工具的最终用户在程序外部设置的),而是能够停止该过程(例如超时)。
有人知道这是否是 .NET 正则表达式的众所周知的问题,以及是否有一种简单的方法可以解决它,或者我是否必须对这些行进行线程化并在需要时中止它们(我绝对不想这样做) )。
Let's consider the two following lines in C# (using framework .NET 3.5)
Regex regex = new Regex(@"^((E|e)t )?(M|m)oi (?<NewName>[A-Za-z]\.?\w*((\-|\s)?[A-Za-z]?\w{1,})+)$", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Match m = regex.Match("moi aussi jaimerai etre un ordinateur pour pas m'énnerver ");
(sorry it's a french program :))
When they are executed, the process gets stuck in the Match()
method and never exits. I guess there is some problem with the white space in the regex pattern but what I would like to do is not changing the pattern (actually it is set outside the program, by the end users of my tool) but being able to stop the process (with a timeout for instance).
Does someone know if this is well-known problem with the .NET Regular Expression and if there is an easy way to work around it or do I have to thread these lines and abort them if needed (definitely I wouldn't like to do that).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果我在 Regexbuddy 中输入表达式,它会显示以下消息
查找灾难性回溯给出了以下解释
我假设你必须用代码来处理它。 我建议您联系 Regexbuddy 的作者并询问检测这种情况需要什么。
If I enter the expression in Regexbuddy, it presents following message
Looking up catastrophic backtracking gives the following explanation
I assume you are going to have to handle it in code. I'd suggest you contact the author of Regexbuddy and ask what is needed to detect this scenario.
我认为您应该简单地在单独的线程上启动正则表达式匹配,并允许在达到某个最大时间限制时中止它。
I think you should simply launch the Regex match on a separate thread and allow it to be aborted if a certain maximum time limit is reached.
一般来说,正则表达式花费的时间可能比您预期的要长。 您应该使用 Regulator 等工具来尝试正则表达式。
In general, regular expressions can take longer than you expect. You should experiment with the regular expression hjusing a tool like Regulator.
问题是您在正则表达式中嵌套了“循环”,这使得它的效率非常低(由于表达式的复杂性,它基本上需要永远)。
如果你说出你想匹配什么,我可以尝试找出一个更有效的正则表达式。
The problem is that you have nested "loops" in your Regex, which make it terribly inefficient (so that it basically takes forever due to the complexity of the expression).
If you say what you would like to match, I can try to figure out a more efficient Regex for that.
在我看来,正则表达式匹配呈指数级增长。 请参阅 BCL 博客。
最好的解决方案是在正则表达式上设置超时,不要弄乱线程。
请参阅此处通过超时删除字符串。
Seems to me like its a case the regex match growing exponentially. See the BCL blog.
Best solution is to set a timeout on the regex, no messing about with threads.
See here how to strip out strings with timeout.