StringreplaceAll() 与 MatcherreplaceAll() (性能差异)
String.replaceAll() 和 Matcher.replaceAll() (在从 Regex.Pattern 创建的 Matcher 对象上)在性能方面是否存在已知差异?
另外,两者之间的高级 API 差异是什么? (不变性、处理 NULL、处理空字符串等)
Are there known difference(s) between String.replaceAll() and Matcher.replaceAll() (On a Matcher Object created from a Regex.Pattern) in terms of performance?
Also, what are the high-level API 'ish differences between the both? (Immutability, Handling NULLs, Handling empty strings, etc.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
根据
String.replaceAll
,它有以下关于调用该方法的内容:,可以预期调用
String.replaceAll
和显式创建匹配器
和Pattern
应该相同。编辑
正如评论中所指出的,对于从
String
或replaceAll
进行单次调用,性能差异是不存在的。Matcher
,但是,如果需要对replaceAll
执行多次调用,人们会期望保留已编译的Pattern
是有益的,因此相对昂贵的正则表达式模式编译不必每次都执行。According to the documentation for
String.replaceAll
, it has the following to say about calling the method:Therefore, it can be expected the performance between invoking the
String.replaceAll
, and explicitly creating aMatcher
andPattern
should be the same.Edit
As has been pointed out in the comments, the performance difference being non-existent would be true for a single call to
replaceAll
fromString
orMatcher
, however, if one needs to perform multiple calls toreplaceAll
, one would expect it to be beneficial to hold onto a compiledPattern
, so the relatively expensive regular expression pattern compilation does not have to be performed every time.String.replaceAll()
的源代码:它必须首先编译该模式 - 如果您要在短字符串上使用相同的模式多次运行它,如果重用,性能会好得多一个已编译的模式。
Source code of
String.replaceAll()
:It has to compile the pattern first - if you're going to run it many times with the same pattern on short strings, performance will be much better if you reuse one compiled Pattern.
主要区别在于,如果您保留用于生成
Matcher
的Pattern
,则可以避免每次使用时重新编译正则表达式。通过String
,您无法像这样“缓存”。如果每次都有不同的正则表达式,那么使用
String
类的replaceAll
就可以了。如果您将相同的正则表达式应用于多个字符串,请创建一个Pattern
并重复使用它。The main difference is that if you hold onto the
Pattern
used to produce theMatcher
, you can avoid recompiling the regex every time you use it. Going throughString
, you don't get the ability to "cache" like this.If you have a different regex every time, using the
String
class'sreplaceAll
is fine. If you are applying the same regex to many strings, create onePattern
and reuse it.不可变性/线程安全:编译后的模式是不可变的,而匹配器则不是。 (请参阅Java Regex 线程安全吗?)
处理空字符串:replaceAll 应该处理空字符串优雅地(它不会匹配空的输入字符串模式)
煮咖啡等:最后我听说,String、Pattern 和 Matcher 都没有任何 API 功能。
编辑:至于处理 NULL,String 和 Pattern 的文档没有明确说明这一点,但我怀疑他们会抛出 NullPointerException 因为他们期望一个 String。
Immutability / thread safety: compiled Patterns are immutable, Matchers are not. (see Is Java Regex Thread Safe?)
Handling empty strings: replaceAll should handle empty strings gracefully (it won't match an empty input string pattern)
Making coffee, etc.: last I heard, neither String nor Pattern nor Matcher had any API features for that.
edit: as for handling NULLs, the documentation for String and Pattern doesn't explicitly say so, but I suspect they'd throw a NullPointerException since they expect a String.
String.replaceAll
的实现告诉您需要知道的一切:(文档也说了同样的事情。)
虽然我没有检查缓存,但我当然希望编译一个模式< em>一次并保留对此的静态引用比每次使用相同的模式调用
Pattern.compile
更有效。如果有缓存,效率会得到小幅提升,如果没有,效率可能会大幅提升。The implementation of
String.replaceAll
tells you everything you need to know:(And the docs say the same thing.)
While I haven't checked for caching, I'd certainly expect that compiling a pattern once and keeping a static reference to that would be more efficient than calling
Pattern.compile
with the same pattern each time. If there's a cache it'll be a small efficiency saving - if there isn't it could be a large one.不同之处在于 String.replaceAll() 每次调用时都会编译正则表达式。 .NET 的静态 Regex.Replace() 方法没有等效项,该方法会自动缓存已编译的正则表达式。通常,replaceAll() 只执行一次,但如果您要使用相同的正则表达式重复调用它,尤其是在循环中,则应该创建一个 Pattern 对象并使用 Matcher 方法。
您也可以提前创建 Matcher,并使用其 reset() 方法在每次使用时重新定位它:
当然,重用 Matcher 的性能优势远不如重用 Pattern 的性能优势大。
The difference is that String.replaceAll() compiles the regex each time it's called. There's no equivalent for .NET's static Regex.Replace() method, which automatically caches the compiled regex. Usually, replaceAll() is something you do only once, but if you're going to be calling it repeatedly with the same regex, especially in a loop, you should create a Pattern object and use the Matcher method.
You can create the Matcher ahead of time, too, and use its reset() method to retarget it for each use:
The performance benefit of reusing the Matcher, of course, is nowhere as great as that of reusing the Pattern.
其他答案足以涵盖OP的性能部分,但是
Matcher::replaceAll
和String::replaceAll
之间的另一个区别也是编译您自己的的原因模式
。当您自己编译Pattern
时,可以使用标志等选项来修改正则表达式的应用方式。例如:Matcher
将应用您在调用Matcher::replaceAll
时设置的所有标志。您还可以设置其他标志。大多数情况下,我只是想指出
Pattern
和Matcher
API 有很多选项,这是超越简单String::replaceAll< 的主要原因/代码>
The other answers sufficiently cover the performance part of the OP, but another difference between
Matcher::replaceAll
andString::replaceAll
is also a reason to compile your ownPattern
. When you compile aPattern
yourself, there are options like flags to modify how the regex is applied. For example:The
Matcher
will apply all the flags you set when you callMatcher::replaceAll
.There are other flags you can set as well. Mostly I just wanted to point out that the
Pattern
andMatcher
API has lots of options, and that's the primary reason to go beyond the simpleString::replaceAll