为什么是 s/^\s+|\s+$//g;比两次单独替换慢得多?
Perl 常见问题解答条目 如何从字符串的开头/结尾去除空格? 指出使用比分
s/^\s+|\s+$//g;
两步执行要慢:
s/^\s+//;
s/\s+$//;
为什么这个组合语句明显比单独的语句慢(对于任何输入字符串)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
由于这两种方法在逻辑上是等效的,因此它们的评估性能没有内在的差异。然而,在实践中,某些引擎将无法发现更复杂的正则表达式中的优化。
在这种情况下,组合的正则表达式作为一个整体是未锚定的,因此它可能在字符串中的任何点匹配,而
^\s+
锚定在开头,因此匹配很简单,并且\s+$
锚定在末尾,并为从末尾向后的每个字符提供单个字符类 - 一个经过良好优化的引擎将识别这一事实并反向匹配,这使得它成为就像输入反面的^\s+
匹配一样简单。Since the two methods are logically equivalent, there's no inherent reason for them to differ in evaluation performance. In practice, however, some engines won't be able to spot optimizations in more complex regexes.
In this case, the combined regex as a whole is unanchored, so it could potentially match at any point in the string, while the
^\s+
is anchored at the start, so it is trivial to match, and\s+$
is anchored at the end, and provides a single character class for each character from the end backwards - a well-optimized engine will recognize that fact and will match in reverse, which makes it as trivial as a^\s+
match on the reverse of the input.如果情况确实如此,那是因为正则表达式引擎能够对单个正则表达式进行比组合正则表达式更好的优化。
“明显变慢”是什么意思?
If this is indeed the case, then it would be because the regex engine is able to optimize better for the individual regexes than for the combined one.
What do you mean by "noticeably slower"?
当使用“固定”或“锚定”子字符串而不是“浮动”子字符串时,Perl 正则表达式运行时运行得更快。当您可以将子字符串锁定到源字符串中的某个位置时,子字符串就被固定了。 '^' 和 '$' 都提供这种锚定。但是,当您使用交替“|”时,编译器不会将这些选择识别为固定的,因此它使用优化程度较低的代码来扫描整个字符串。在该过程的最后,两次查找固定字符串比查找一次浮动字符串要快得多。与此相关的是,阅读 Perl 的 regcomp.c 会让你失明。
更新:
这里有一些额外的细节。如果您已使用调试支持编译了 perl,则可以使用“-Dr”标志运行 perl,并且它将转储出正则表达式编译数据。这就是您得到的结果:
请注意第一个转储中的“锚定”一词。
The Perl regex runtime runs much quicker when working with 'fixed' or 'anchored' substrings rather than 'floated' substrings. A substring is fixed when you can lock it to a certain place in the source string. Both '^' and '$' provide that anchoring. However, when you use alternation '|', the compiler doesn't recognize the choices as fixed, so it uses less optimized code to scan the whole string. And at the end of the process, looking for fixed strings twice is much, much faster than looking for a floating string once. On a related note, reading perl's regcomp.c will make you go blind.
Update:
Here's some additional details. You can run perl with the '-Dr' flag if you've compiled it with debugging support and it'll dump out regex compilation data. Here's what you get:
Note the word 'anchored' in the first dump.
其他答案表明,完全锚定的正则表达式允许引擎优化搜索过程,仅关注开头或结尾或字符串。通过比较使用不同长度字符串的两种方法的速度差异,您似乎可以看到这种优化的效果。随着字符串变长,“浮动”正则表达式(使用交替)受到的影响越来越大。
Other answers have indicated that the fully anchored regexes allow the engine to optimize the search process, focusing on just the beginning or the end or the string. It appears that you can see the effect of this optimization by comparing the speed difference of the two approaches using strings of various lengths. As the string gets longer, the "floating" regex (using alternation) suffers more and more.