正则表达式中大十进制数发生灾难性回溯的原因是什么?
我是正则表达式的新手,我一直在使用Python的RE模块进行一个项目,并且需要使用嵌套的数学表达式。
到目前为止,这一直相对较好,但是我注意到使用Re.Search()工作时,带有9个或更多小数点(作为字符串)的浮动数字会引起灾难性的回溯。
例如,这一行不会引起任何问题。
match = re.search(r"\(((?:[^()]+|R)+)\)", "(1.00000000+cos(1.0-3.0))")
但是,这一行确实会引起问题。
match = re.search(r"\(((?:[^()]+|R)+)\)", "(1.000000000+cos(1.0-3.0))")
对于任何想知道为什么我不删除第一个括号以删除问题的人,请注意这是一组孤立的行:在程序中,您可能会遇到这样的事情:
expression = "cos(tan(1)+sin(2)+tan(3)+sin(4)+tan(5))"
# during parsing, this becomes "cos(1.5574077246549023+0.9092974268256817+tan(3)+sin(4)+tan(5))"
# re.search(r"\(((?:[^()]+|R)+)\)", expression) would then cause a catastrophic backtrack
I'm new to regular expressions and I've been working on a project using Python's re module and have needed to work with nested mathematical expressions.
This has been going relatively fine up until now, but I've noticed that floating numbers with nine or more decimal places (as strings) cause catastrophic backtracking when working with re.search().
For example, this line causes no issue.
match = re.search(r"\(((?:[^()]+|R)+)\)", "(1.00000000+cos(1.0-3.0))")
However, this line does cause an issue.
match = re.search(r"\(((?:[^()]+|R)+)\)", "(1.000000000+cos(1.0-3.0))")
For anyone wondering why I don't remove the first parentheses to remove the issue, note that this is an isolated set of lines: in the program, you may come across something like this:
expression = "cos(tan(1)+sin(2)+tan(3)+sin(4)+tan(5))"
# during parsing, this becomes "cos(1.5574077246549023+0.9092974268256817+tan(3)+sin(4)+tan(5))"
# re.search(r"\(((?:[^()]+|R)+)\)", expression) would then cause a catastrophic backtrack
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
python
re
不支持递归。但是,在您的模式下,您正在使用r
char中的嵌套量词。如果模式无法匹配,它将在导致灾难性回溯之前尝试所有排列。
如果您可以使用 python pyphon pypi模块您可以使用
(( ?R)
See a regex demo with pcre selected that does support recursion.
Python
re
does not support recursion. But in your pattern you are using nested quantifiers with anR
char in the alternation.If the pattern can not match, it will try all permutations before failing causing catastrophic backtracking.
If you can make use of the Python pypi module you could recurse the whole pattern using
(?R)
See a regex demo with pcre selected that does support recursion.