如果性能很重要,我应该使用 Java 的 String.format() 吗?
我们必须一直构建字符串用于日志输出等。 在 JDK 版本中,我们已经了解了何时使用 StringBuffer(多次追加,线程安全)和 StringBuilder(多次追加,非线程安全)。
使用String.format()
有什么建议? 它是否高效,或者我们是否被迫坚持对性能很重要的单行代码进行串联?
例如,丑陋的旧样式
String s = "What do you get if you multiply " + varSix + " by " + varNine + "?";
与整洁的新样式(String.format,可能会更慢),
String s = String.format("What do you get if you multiply %d by %d?", varSix, varNine);
注意:我的具体用例是整个代码中的数百个“单行”日志字符串。 它们不涉及循环,因此 StringBuilder
太重量级了。 我对 String.format()
特别感兴趣。
We have to build Strings all the time for log output and so on. Over the JDK versions we have learned when to use StringBuffer
(many appends, thread safe) and StringBuilder
(many appends, non-thread-safe).
What's the advice on using String.format()
? Is it efficient, or are we forced to stick with concatenation for one-liners where performance is important?
e.g. ugly old style,
String s = "What do you get if you multiply " + varSix + " by " + varNine + "?";
vs. tidy new style (String.format, which is possibly slower),
String s = String.format("What do you get if you multiply %d by %d?", varSix, varNine);
Note: my specific use case is the hundreds of 'one-liner' log strings throughout my code. They don't involve a loop, so StringBuilder
is too heavyweight. I'm interested in String.format()
specifically.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
我采用了 hhafez 的代码并添加了内存测试:
我为每种方法单独运行此测试、“+”运算符、String.format 和 StringBuilder(调用 toString()),因此使用的内存不会受到其他方法的影响。
我添加了更多连接,使字符串为“Blah”+ i +“Blah”+ i +“Blah”+ i +“Blah”。
结果如下(每次运行 5 次的平均值):
+
运算符String.format
StringBuilder
我们可以看到,String
+
和StringBuilder
在时间上几乎是相同的,但是StringBuilder
在内存上效率更高使用。当我们在足够短的时间间隔内进行许多日志调用(或任何其他涉及字符串的语句)时,这一点非常重要,这样垃圾收集器就无法清理
+
运算符产生的许多字符串实例。顺便说一句,请注意,在构造消息之前不要忘记检查日志记录级别。
结论:
StringBuilder
。I took hhafez's code and added a memory test:
I run this separately for each approach, the '+' operator, String.format and StringBuilder (calling toString()), so the memory used will not be affected by other approaches.
I added more concatenations, making the string as "Blah" + i + "Blah"+ i +"Blah" + i + "Blah".
The result are as follows (average of 5 runs each):
+
operatorString.format
StringBuilder
We can see that String
+
andStringBuilder
are practically identical time-wise, butStringBuilder
is much more efficient in memory use.This is very important when we have many log calls (or any other statements involving strings) in a time interval short enough so the Garbage Collector won't get to clean the many string instances resulting of the
+
operator.And a note, BTW, don't forget to check the logging level before constructing the message.
Conclusions:
StringBuilder
.我写了一个小类来测试两者中哪个具有更好的性能,并且 + 领先于格式。 5 到 6 倍。
自己尝试一下
对不同的 N 运行上面的代码显示两者的行为都是线性的,但是
String.format
慢了 5-30 倍。原因是在当前的实现中
String.format
首先用正则表达式解析输入,然后填充参数。 另一方面,与 plus 的连接由 javac(而不是 JIT)优化并直接使用StringBuilder.append
。I wrote a small class to test which has the better performance of the two and + comes ahead of format. by a factor of 5 to 6.
Try it your self
Running the above for different N shows that both behave linearly, but
String.format
is 5-30 times slower.The reason is that in the current implementation
String.format
first parses the input with regular expressions and then fills in the parameters. Concatenation with plus, on the other hand, gets optimized by javac (not by the JIT) and usesStringBuilder.append
directly.这里提供的所有基准测试都有一些缺陷 ,因此结果不可靠。
我很惊讶没有人使用 JMH 进行基准测试,所以我就这么做了。
结果:
单位是每秒的操作数,越多越好。 基准源代码。 使用 OpenJDK IcedTea 2.5.4 Java 虚拟机。
所以,旧式(使用+)要快得多。
All the benchmarks presented here have some flaws, thus results are not reliable.
I was surprised that nobody used JMH for benchmarking, so I did.
Results:
Units are operations per second, the more the better. Benchmark source code. OpenJDK IcedTea 2.5.4 Java Virtual Machine was used.
So, old style (using +) is much faster.
你旧的丑陋风格被 JAVAC 1.6 自动编译为:
所以这和使用 StringBuilder 之间绝对没有区别。
String.format 更重量级,因为它创建一个新的 Formatter,解析输入格式字符串,创建一个 StringBuilder,将所有内容附加到它并调用 toString()。
Your old ugly style is automatically compiled by JAVAC 1.6 as :
So there is absolutely no difference between this and using a StringBuilder.
String.format is a lot more heavyweight since it creates a new Formatter, parses your input format string, creates a StringBuilder, append everything to it and calls toString().
Java 的 String.format 的工作原理如下:
StringBuilder.toString()如果此数据的最终目的地是流(例如渲染网页或写入文件), ),您可以将格式块直接组装到流中:
我推测优化器将优化格式字符串处理。 如果是这样,您将获得与手动将 String.format 展开到 StringBuilder 相同的摊销性能。
Java's String.format works like so:
if the final destination for this data is a stream (e.g. rendering a webpage or writing to a file), you can assemble the format chunks directly into your stream:
I speculate that the optimizer will optimize away the format string processing. If so, you're left with equivalent amortized performance to manually unrolling your String.format into a StringBuilder.
为了扩展/更正上面的第一个答案,实际上 String.format 并不能帮助翻译。
String.format 将帮助您打印日期/时间(或数字格式等),其中存在本地化(l10n)差异(即,某些国家/地区将打印 04Feb2009,而其他国家/地区将打印 Feb042009)。
通过翻译,您只是谈论将任何可外部化的字符串(例如错误消息等)移动到属性包中,以便您可以使用 ResourceBundle 和 MessageFormat 将正确的包用于正确的语言。
综上所述,我想说,从性能角度来看,String.format 与普通连接取决于您的喜好。 如果您更喜欢查看对 .format 的调用而不是串联,那么无论如何,请这样做。
毕竟,代码的阅读次数远多于编写的次数。
To expand/correct on the first answer above, it's not translation that String.format would help with, actually.
What String.format will help with is when you're printing a date/time (or a numeric format, etc), where there are localization(l10n) differences (ie, some countries will print 04Feb2009 and others will print Feb042009).
With translation, you're just talking about moving any externalizable strings (like error messages and what-not) into a property bundle so that you can use the right bundle for the right language, using ResourceBundle and MessageFormat.
Looking at all the above, I'd say that performance-wise, String.format vs. plain concatenation comes down to what you prefer. If you prefer looking at calls to .format over concatenation, then by all means, go with that.
After all, code is read a lot more than it's written.
在您的示例中,性能概率并没有太大不同,但还有其他问题需要考虑:即内存碎片。 即使连接操作也会创建一个新字符串,即使它是临时的(GC 需要时间并且需要更多工作)。 String.format() 只是更具可读性并且涉及更少的碎片。
另外,如果您经常使用某种特定格式,请不要忘记您可以直接使用 Formatter() 类(所有 String.format() 所做的只是实例化一个一次性使用的 Formatter 实例)。
另外,您还应该注意其他事项:小心使用 substring()。 例如:
那个大字符串仍在内存中,因为这就是 Java 子字符串的工作原理。 更好的版本是:
或者
如果您同时做其他事情,第二种形式可能更有用。
In your example, performance probalby isn't too different but there are other issues to consider: namely memory fragmentation. Even concatenate operation is creating a new string, even if its temporary (it takes time to GC it and it's more work). String.format() is just more readable and it involves less fragmentation.
Also, if you're using a particular format a lot, don't forget you can use the Formatter() class directly (all String.format() does is instantiate a one use Formatter instance).
Also, something else you should be aware of: be careful of using substring(). For example:
That large string is still in memory because that's just how Java substrings work. A better version is:
or
The second form is probably more useful if you're doing other stuff at the same time.
一般来说,您应该使用 String.Format,因为它相对较快并且支持全球化(假设您实际上正在尝试编写用户可以读取的内容)。 如果您尝试翻译一个字符串而不是每个语句翻译 3 个或更多字符串(特别是对于语法结构截然不同的语言),它还可以使全球化变得更容易。
现在,如果您从未打算翻译任何内容,那么要么依赖 Java 内置的 + 运算符到
StringBuilder
的转换。 或者显式使用 Java 的StringBuilder
。Generally you should use String.Format because it's relatively fast and it supports globalization (assuming you're actually trying to write something that is read by the user). It also makes it easier to globalize if you're trying to translate one string versus 3 or more per statement (especially for languages that have drastically different grammatical structures).
Now if you never plan on translating anything, then either rely on Java's built in conversion of + operators into
StringBuilder
. Or use Java'sStringBuilder
explicitly.仅从日志记录的角度来看另一个角度。
我看到很多与登录此线程相关的讨论,因此考虑在答案中添加我的经验。 也许有人会发现它有用。
我猜想使用格式化程序进行日志记录的动机来自于避免字符串连接。 基本上,如果您不打算记录字符串连接,那么您不希望有字符串连接的开销。
除非您想记录,否则您实际上并不需要连接/格式化。 可以说,如果我定义这样的方法,
在这种方法中,如果它是调试消息并且 debugOn = false,则根本不会真正调用 cancat/formatter
尽管在这里使用 StringBuilder 而不是格式化程序仍然会更好。 主要动机是避免任何这种情况。
同时,我不喜欢为每个日志语句添加“if”块,因为
因此,我更喜欢使用上面的方法创建一个日志记录实用程序类,并在任何地方使用它,而不用担心性能影响和与之相关的任何其他问题。
Another perspective from Logging point of view Only.
I see a lot of discussion related to logging on this thread so thought of adding my experience in answer. May be someone will find it useful.
I guess the motivation of logging using formatter comes from avoiding the string concatenation. Basically, you do not want to have an overhead of string concat if you are not going to log it.
You do not really need to concat/format unless you want to log. Lets say if I define a method like this
In this approach the cancat/formatter is not really called at all if its a debug message and debugOn = false
Though it will still be better to use StringBuilder instead of formatter here. The main motivation is to avoid any of that.
At the same time I do not like adding "if" block for each logging statement since
Therefore I prefer to create a logging utility class with methods like above and use it everywhere without worrying about performance hit and any other issues related to it.
我刚刚修改了 hhafez 的测试以包含 StringBuilder。 在 XP 上使用 jdk 1.6.0_10 客户端,StringBuilder 比 String.format 快 33 倍。 使用 -server 开关将系数降低到 20。
虽然这听起来可能很夸张,但我认为它只在极少数情况下相关,因为绝对数字非常低:100 万个简单 String.format 调用需要 4 秒就可以了- 只要我用它们进行日志记录等。
更新: 正如 sjbotha 在评论中指出的那样,StringBuilder 测试无效,因为它缺少最终的.toString()。
在我的机器上,从
String.format(.)
到StringBuilder
的正确加速系数是 23(使用-server
开关时为 16)。I just modified hhafez's test to include StringBuilder. StringBuilder is 33 times faster than String.format using jdk 1.6.0_10 client on XP. Using the -server switch lowers the factor to 20.
While this might sound drastic, I consider it to be relevant only in rare cases, because the absolute numbers are pretty low: 4 s for 1 million simple String.format calls is sort of ok - as long as I use them for logging or the like.
Update: As pointed out by sjbotha in the comments, the StringBuilder test is invalid, since it is missing a final
.toString()
.The correct speed-up factor from
String.format(.)
toStringBuilder
is 23 on my machine (16 with the-server
switch).这是 hafez 条目的修改版本。 它包括一个字符串生成器选项。
for
循环 391 之后的时间
for 循环后的时间 4163
for循环227后的时间
Here is modified version of hhafez entry. It includes a string builder option.
}
Time after for loop 391
Time after for loop 4163
Time after for loop 227
这个问题的答案很大程度上取决于您的特定 Java 编译器如何优化它生成的字节码。 字符串是不可变的,理论上,每个“+”操作都可以创建一个新字符串。 但是,您的编译器几乎肯定会优化构建长字符串的临时步骤。 上面的两行代码完全有可能生成完全相同的字节码。
唯一真正了解的方法是在当前环境中迭代测试代码。 编写一个 QD 应用程序,以迭代方式连接字符串,并查看它们如何相互超时。
The answer to this depends very much on how your specific Java compiler optimizes the bytecode it generates. Strings are immutable and, theoretically, each "+" operation can create a new one. But, your compiler almost certainly optimizes away interim steps in building long strings. It's entirely possible that both lines of code above generate the exact same bytecode.
The only real way to know is to test the code iteratively in your current environment. Write a QD app that concatenates strings both ways iteratively and see how they time out against each other.
对于串联的少量字符串,请考虑使用
"hello".concat( "world!" )
。 它的性能可能比其他方法更好。如果您有超过 3 个字符串,请考虑使用 StringBuilder,或仅使用 String,具体取决于您使用的编译器。
Consider using
"hello".concat( "world!" )
for small number of strings in concatenation. It could be even better for performance than other approaches.If you have more than 3 strings, than consider using StringBuilder, or just String, depending on compiler that you use.