如果性能很重要,我应该使用 Java 的 String.format() 吗?

发布于 2024-07-13 03:24:26 字数 589 浏览 7 评论 0原文

我们必须一直构建字符串用于日志输出等。 在 JDK 版本中,我们已经了解了何时使用 StringBuffer(多次追加,线程安全)和 StringBuilder(多次追加,非线程安全)。

使用String.format()有什么建议? 它是否高效,或者我们是否被迫坚持对性能很重要的单行代码进行串联?

例如,丑陋的旧样式

String s = "What do you get if you multiply " + varSix + " by " + varNine + "?";

与整洁的新样式(String.format,可能会更慢),

String s = String.format("What do you get if you multiply %d by %d?", varSix, varNine);

注意:我的具体用例是整个代码中的数百个“单行”日志字符串。 它们不涉及循环,因此 StringBuilder 太重量级了。 我对 String.format() 特别感兴趣。

We have to build Strings all the time for log output and so on. Over the JDK versions we have learned when to use StringBuffer (many appends, thread safe) and StringBuilder (many appends, non-thread-safe).

What's the advice on using String.format()? Is it efficient, or are we forced to stick with concatenation for one-liners where performance is important?

e.g. ugly old style,

String s = "What do you get if you multiply " + varSix + " by " + varNine + "?";

vs. tidy new style (String.format, which is possibly slower),

String s = String.format("What do you get if you multiply %d by %d?", varSix, varNine);

Note: my specific use case is the hundreds of 'one-liner' log strings throughout my code. They don't involve a loop, so StringBuilder is too heavyweight. I'm interested in String.format() specifically.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

把昨日还给我 2024-07-20 03:24:26

我采用了 hhafez 的代码并添加了内存测试

private static void test() {
    Runtime runtime = Runtime.getRuntime();
    long memory;
    ...
    memory = runtime.freeMemory();
    // for loop code
    memory = memory-runtime.freeMemory();

我为每种方法单独运行此测试、“+”运算符、String.format 和 StringBuilder(调用 toString()),因此使用的内存不会受到其他方法的影响。
我添加了更多连接,使字符串为“Blah”+ i +“Blah”+ i +“Blah”+ i +“Blah”。

结果如下(每次运行 5 次的平均值):

接近时间(ms)分配的内存(长)
+ 运算符747320,504
String.format16484373,312
StringBuilder76957,344

我们可以看到,String +StringBuilder 在时间上几乎是相同的,但是 StringBuilder 在内存上效率更高使用。
当我们在足够短的时间间隔内进行许多日志调用(或任何其他涉及字符串的语句)时,这一点非常重要,这样垃圾收集器就无法清理 + 运算符产生的许多字符串实例。

顺便说一句,请注意,在构造消息之前不要忘记检查日志记录级别。

结论:

  1. 我将继续使用 StringBuilder
  2. 我的时间太多,或者生命太少。

I took hhafez's code and added a memory test:

private static void test() {
    Runtime runtime = Runtime.getRuntime();
    long memory;
    ...
    memory = runtime.freeMemory();
    // for loop code
    memory = memory-runtime.freeMemory();

I run this separately for each approach, the '+' operator, String.format and StringBuilder (calling toString()), so the memory used will not be affected by other approaches.
I added more concatenations, making the string as "Blah" + i + "Blah"+ i +"Blah" + i + "Blah".

The result are as follows (average of 5 runs each):

ApproachTime(ms)Memory allocated (long)
+ operator747320,504
String.format16484373,312
StringBuilder76957,344

We can see that String + and StringBuilder are practically identical time-wise, but StringBuilder is much more efficient in memory use.
This is very important when we have many log calls (or any other statements involving strings) in a time interval short enough so the Garbage Collector won't get to clean the many string instances resulting of the + operator.

And a note, BTW, don't forget to check the logging level before constructing the message.

Conclusions:

  1. I'll keep on using StringBuilder.
  2. I have too much time or too little life.
木緿 2024-07-20 03:24:26

我写了一个小类来测试两者中哪个具有更好的性能,并且 + 领先于格式。 5 到 6 倍。
自己尝试一下

import java.io.*;
import java.util.Date;

public class StringTest{

    public static void main( String[] args ){
    int i = 0;
    long prev_time = System.currentTimeMillis();
    long time;

    for( i = 0; i< 100000; i++){
        String s = "Blah" + i + "Blah";
    }
    time = System.currentTimeMillis() - prev_time;

    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<100000; i++){
        String s = String.format("Blah %d Blah", i);
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

    }
}

对不同的 N 运行上面的代码显示两者的行为都是线性的,但是 String.format 慢了 5-30 倍。

原因是在当前的实现中String.format首先用正则表达式解析输入,然后填充参数。 另一方面,与 plus 的连接由 javac(而不是 JIT)优化并直接使用 StringBuilder.append

运行时比较

I wrote a small class to test which has the better performance of the two and + comes ahead of format. by a factor of 5 to 6.
Try it your self

import java.io.*;
import java.util.Date;

public class StringTest{

    public static void main( String[] args ){
    int i = 0;
    long prev_time = System.currentTimeMillis();
    long time;

    for( i = 0; i< 100000; i++){
        String s = "Blah" + i + "Blah";
    }
    time = System.currentTimeMillis() - prev_time;

    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<100000; i++){
        String s = String.format("Blah %d Blah", i);
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

    }
}

Running the above for different N shows that both behave linearly, but String.format is 5-30 times slower.

The reason is that in the current implementation String.format first parses the input with regular expressions and then fills in the parameters. Concatenation with plus, on the other hand, gets optimized by javac (not by the JIT) and uses StringBuilder.append directly.

Runtime comparison

抚笙 2024-07-20 03:24:26

这里提供的所有基准测试都有一些缺陷 ,因此结果不可靠。

我很惊讶没有人使用 JMH 进行基准测试,所以我就这么做了。

结果:

Benchmark             Mode  Cnt     Score     Error  Units
MyBenchmark.testOld  thrpt   20  9645.834 ± 238.165  ops/s  // using +
MyBenchmark.testNew  thrpt   20   429.898 ±  10.551  ops/s  // using String.format

单位是每秒的操作数,越多越好。 基准源代码。 使用 OpenJDK IcedTea 2.5.4 Java 虚拟机。

所以,旧式(使用+)要快得多。

All the benchmarks presented here have some flaws, thus results are not reliable.

I was surprised that nobody used JMH for benchmarking, so I did.

Results:

Benchmark             Mode  Cnt     Score     Error  Units
MyBenchmark.testOld  thrpt   20  9645.834 ± 238.165  ops/s  // using +
MyBenchmark.testNew  thrpt   20   429.898 ±  10.551  ops/s  // using String.format

Units are operations per second, the more the better. Benchmark source code. OpenJDK IcedTea 2.5.4 Java Virtual Machine was used.

So, old style (using +) is much faster.

骷髅 2024-07-20 03:24:26

你旧的丑陋风格被 JAVAC 1.6 自动编译为:

StringBuilder sb = new StringBuilder("What do you get if you multiply ");
sb.append(varSix);
sb.append(" by ");
sb.append(varNine);
sb.append("?");
String s =  sb.toString();

所以这和使用 StringBuilder 之间绝对没有区别。

String.format 更重量级,因为它创建一个新的 Formatter,解析输入格式字符串,创建一个 StringBuilder,将所有内容附加到它并调用 toString()。

Your old ugly style is automatically compiled by JAVAC 1.6 as :

StringBuilder sb = new StringBuilder("What do you get if you multiply ");
sb.append(varSix);
sb.append(" by ");
sb.append(varNine);
sb.append("?");
String s =  sb.toString();

So there is absolutely no difference between this and using a StringBuilder.

String.format is a lot more heavyweight since it creates a new Formatter, parses your input format string, creates a StringBuilder, append everything to it and calls toString().

爱的十字路口 2024-07-20 03:24:26

Java 的 String.format 的工作原理如下:

  1. 它解析格式字符串,分解为格式块列表,
  2. 它迭代格式块,渲染到 StringBuilder,它基本上是一个数组,通过复制到新数组中,根据需要调整自身大小。 这是必要的,因为我们还不知道要分配多大的最终字符串
  3. 则将其内部缓冲区复制到新字符串中

StringBuilder.toString()如果此数据的最终目的地是流(例如渲染网页或写入文件), ),您可以将格式块直接组装到流中:

new PrintStream(outputStream, autoFlush, encoding).format("hello {0}", "world");

我推测优化器将优化格式字符串处理。 如果是这样,您将获得与手动将 String.format 展开到 StringBuilder 相同的摊销性能。

Java's String.format works like so:

  1. it parses the format string, exploding into a list of format chunks
  2. it iterates the format chunks, rendering into a StringBuilder, which is basically an array that resizes itself as necessary, by copying into a new array. this is necessary because we don't yet know how large to allocate the final String
  3. StringBuilder.toString() copies his internal buffer into a new String

if the final destination for this data is a stream (e.g. rendering a webpage or writing to a file), you can assemble the format chunks directly into your stream:

new PrintStream(outputStream, autoFlush, encoding).format("hello {0}", "world");

I speculate that the optimizer will optimize away the format string processing. If so, you're left with equivalent amortized performance to manually unrolling your String.format into a StringBuilder.

一桥轻雨一伞开 2024-07-20 03:24:26

为了扩展/更正上面的第一个答案,实际上 String.format 并不能帮助翻译。
String.format 将帮助您打印日期/时间(或数字格式等),其中存在本地化(l10n)差异(即,某些国家/地区将打印 04Feb2009,而其他国家/地区将打印 Feb042009)。
通过翻译,您只是谈论将任何可外部化的字符串(例如错误消息等)移动到属性包中,以便您可以使用 ResourceBundle 和 MessageFormat 将正确的包用于正确的语言。

综上所述,我想说,从性能角度来看,String.format 与普通连接取决于您的喜好。 如果您更喜欢查看对 .format 的调用而不是串联,那么无论如何,请这样做。
毕竟,代码的阅读次数远多于编写的次数。

To expand/correct on the first answer above, it's not translation that String.format would help with, actually.
What String.format will help with is when you're printing a date/time (or a numeric format, etc), where there are localization(l10n) differences (ie, some countries will print 04Feb2009 and others will print Feb042009).
With translation, you're just talking about moving any externalizable strings (like error messages and what-not) into a property bundle so that you can use the right bundle for the right language, using ResourceBundle and MessageFormat.

Looking at all the above, I'd say that performance-wise, String.format vs. plain concatenation comes down to what you prefer. If you prefer looking at calls to .format over concatenation, then by all means, go with that.
After all, code is read a lot more than it's written.

╰ゝ天使的微笑 2024-07-20 03:24:26

在您的示例中,性能概率并没有太大不同,但还有其他问题需要考虑:即内存碎片。 即使连接操作也会创建一个新字符串,即使它是临时的(GC 需要时间并且需要更多工作)。 String.format() 只是更具可读性并且涉及更少的碎片。

另外,如果您经常使用某种特定格式,请不要忘记您可以直接使用 Formatter() 类(所有 String.format() 所做的只是实例化一个一次性使用的 Formatter 实例)。

另外,您还应该注意其他事项:小心使用 substring()。 例如:

String getSmallString() {
  String largeString = // load from file; say 2M in size
  return largeString.substring(100, 300);
}

那个大字符串仍在内存中,因为这就是 Java 子字符串的工作原理。 更好的版本是:

  return new String(largeString.substring(100, 300));

或者

  return String.format("%s", largeString.substring(100, 300));

如果您同时做其他事情,第二种形式可能更有用。

In your example, performance probalby isn't too different but there are other issues to consider: namely memory fragmentation. Even concatenate operation is creating a new string, even if its temporary (it takes time to GC it and it's more work). String.format() is just more readable and it involves less fragmentation.

Also, if you're using a particular format a lot, don't forget you can use the Formatter() class directly (all String.format() does is instantiate a one use Formatter instance).

Also, something else you should be aware of: be careful of using substring(). For example:

String getSmallString() {
  String largeString = // load from file; say 2M in size
  return largeString.substring(100, 300);
}

That large string is still in memory because that's just how Java substrings work. A better version is:

  return new String(largeString.substring(100, 300));

or

  return String.format("%s", largeString.substring(100, 300));

The second form is probably more useful if you're doing other stuff at the same time.

剩余の解释 2024-07-20 03:24:26

一般来说,您应该使用 String.Format,因为它相对较快并且支持全球化(假设您实际上正在尝试编写用户可以读取的内容)。 如果您尝试翻译一个字符串而不是每个语句翻译 3 个或更多字符串(特别是对于语法结构截然不同的语言),它还可以使全球化变得更容易。

现在,如果您从未打算翻译任何内容,那么要么依赖 Java 内置的 + 运算符到 StringBuilder 的转换。 或者显式使用 Java 的 StringBuilder

Generally you should use String.Format because it's relatively fast and it supports globalization (assuming you're actually trying to write something that is read by the user). It also makes it easier to globalize if you're trying to translate one string versus 3 or more per statement (especially for languages that have drastically different grammatical structures).

Now if you never plan on translating anything, then either rely on Java's built in conversion of + operators into StringBuilder. Or use Java's StringBuilder explicitly.

回忆凄美了谁 2024-07-20 03:24:26

仅从日志记录的角度来看另一个角度。

我看到很多与登录此线程相关的讨论,因此考虑在答案中添加我的经验。 也许有人会发现它有用。

我猜想使用格式化程序进行日志记录的动机来自于避免字符串连接。 基本上,如果您不打算记录字符串连接,那么您不希望有字符串连接的开销。

除非您想记录,否则您实际上并不需要连接/格式化。 可以说,如果我定义这样的方法,

public void logDebug(String... args, Throwable t) {
    if(debugOn) {
       // call concat methods for all args
       //log the final debug message
    }
}

在这种方法中,如果它是调试消息并且 debugOn = false,则根本不会真正调用 cancat/formatter

尽管在这里使用 StringBuilder 而不是格式化程序仍然会更好。 主要动机是避免任何这种情况。

同时,我不喜欢为每个日志语句添加“if”块,因为

  • 它会影响可读性
  • 减少单元测试的覆盖范围 - 当您想确保每一行都经过测试时,这会令人困惑。

因此,我更喜欢使用上面的方法创建一个日志记录实用程序类,并在任何地方使用它,而不用担心性能影响和与之相关的任何其他问题。

Another perspective from Logging point of view Only.

I see a lot of discussion related to logging on this thread so thought of adding my experience in answer. May be someone will find it useful.

I guess the motivation of logging using formatter comes from avoiding the string concatenation. Basically, you do not want to have an overhead of string concat if you are not going to log it.

You do not really need to concat/format unless you want to log. Lets say if I define a method like this

public void logDebug(String... args, Throwable t) {
    if(debugOn) {
       // call concat methods for all args
       //log the final debug message
    }
}

In this approach the cancat/formatter is not really called at all if its a debug message and debugOn = false

Though it will still be better to use StringBuilder instead of formatter here. The main motivation is to avoid any of that.

At the same time I do not like adding "if" block for each logging statement since

  • It affects readability
  • Reduces coverage on my unit tests - thats confusing when you want to make sure every line is tested.

Therefore I prefer to create a logging utility class with methods like above and use it everywhere without worrying about performance hit and any other issues related to it.

我做我的改变 2024-07-20 03:24:26

我刚刚修改了 hhafez 的测试以包含 StringBuilder。 在 XP 上使用 jdk 1.6.0_10 客户端,StringBuilder 比 String.format 快 33 倍。 使用 -server 开关将系数降低到 20。

public class StringTest {

   public static void main( String[] args ) {
      test();
      test();
   }

   private static void test() {
      int i = 0;
      long prev_time = System.currentTimeMillis();
      long time;

      for ( i = 0; i < 1000000; i++ ) {
         String s = "Blah" + i + "Blah";
      }
      time = System.currentTimeMillis() - prev_time;

      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         String s = String.format("Blah %d Blah", i);
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         new StringBuilder("Blah").append(i).append("Blah");
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);
   }
}

虽然这听起来可能很夸张,但我认为它只在极少数情况下相关,因为绝对数字非常低:100 万个简单 String.format 调用需要 4 秒就可以了- 只要我用它们进行日志记录等。

更新: 正如 sjbotha 在评论中指出的那样,StringBuilder 测试无效,因为它缺少最终的.toString()。

在我的机器上,从 String.format(.)StringBuilder 的正确加速系数是 23(使用 -server 开关时为 16)。

I just modified hhafez's test to include StringBuilder. StringBuilder is 33 times faster than String.format using jdk 1.6.0_10 client on XP. Using the -server switch lowers the factor to 20.

public class StringTest {

   public static void main( String[] args ) {
      test();
      test();
   }

   private static void test() {
      int i = 0;
      long prev_time = System.currentTimeMillis();
      long time;

      for ( i = 0; i < 1000000; i++ ) {
         String s = "Blah" + i + "Blah";
      }
      time = System.currentTimeMillis() - prev_time;

      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         String s = String.format("Blah %d Blah", i);
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         new StringBuilder("Blah").append(i).append("Blah");
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);
   }
}

While this might sound drastic, I consider it to be relevant only in rare cases, because the absolute numbers are pretty low: 4 s for 1 million simple String.format calls is sort of ok - as long as I use them for logging or the like.

Update: As pointed out by sjbotha in the comments, the StringBuilder test is invalid, since it is missing a final .toString().

The correct speed-up factor from String.format(.) to StringBuilder is 23 on my machine (16 with the -server switch).

帅的被狗咬 2024-07-20 03:24:26

这是 hafez 条目的修改版本。 它包括一个字符串生成器选项。

public class BLA
{
public static final String BLAH = "Blah ";
public static final String BLAH2 = " Blah";
public static final String BLAH3 = "Blah %d Blah";


public static void main(String[] args) {
    int i = 0;
    long prev_time = System.currentTimeMillis();
    long time;
    int numLoops = 1000000;

    for( i = 0; i< numLoops; i++){
        String s = BLAH + i + BLAH2;
    }
    time = System.currentTimeMillis() - prev_time;

    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<numLoops; i++){
        String s = String.format(BLAH3, i);
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<numLoops; i++){
        StringBuilder sb = new StringBuilder();
        sb.append(BLAH);
        sb.append(i);
        sb.append(BLAH2);
        String s = sb.toString();
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

}

for

循环 391 之后的时间
for 循环后的时间 4163
for循环227后的时间

Here is modified version of hhafez entry. It includes a string builder option.

public class BLA
{
public static final String BLAH = "Blah ";
public static final String BLAH2 = " Blah";
public static final String BLAH3 = "Blah %d Blah";


public static void main(String[] args) {
    int i = 0;
    long prev_time = System.currentTimeMillis();
    long time;
    int numLoops = 1000000;

    for( i = 0; i< numLoops; i++){
        String s = BLAH + i + BLAH2;
    }
    time = System.currentTimeMillis() - prev_time;

    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<numLoops; i++){
        String s = String.format(BLAH3, i);
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<numLoops; i++){
        StringBuilder sb = new StringBuilder();
        sb.append(BLAH);
        sb.append(i);
        sb.append(BLAH2);
        String s = sb.toString();
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

}

}

Time after for loop 391
Time after for loop 4163
Time after for loop 227

jJeQQOZ5 2024-07-20 03:24:26

这个问题的答案很大程度上取决于您的特定 Java 编译器如何优化它生成的字节码。 字符串是不可变的,理论上,每个“+”操作都可以创建一个新字符串。 但是,您的编译器几乎肯定会优化构建长字符串的临时步骤。 上面的两行代码完全有可能生成完全相同的字节码。

唯一真正了解的方法是在当前环境中迭代测试代码。 编写一个 QD 应用程序,以迭代方式连接字符串,并查看它们如何相互超时。

The answer to this depends very much on how your specific Java compiler optimizes the bytecode it generates. Strings are immutable and, theoretically, each "+" operation can create a new one. But, your compiler almost certainly optimizes away interim steps in building long strings. It's entirely possible that both lines of code above generate the exact same bytecode.

The only real way to know is to test the code iteratively in your current environment. Write a QD app that concatenates strings both ways iteratively and see how they time out against each other.

ˇ宁静的妩媚 2024-07-20 03:24:26

对于串联的少量字符串,请考虑使用 "hello".concat( "world!" )。 它的性能可能比其他方法更好。

如果您有超过 3 个字符串,请考虑使用 StringBuilder,或仅使用 String,具体取决于您使用的编译器。

Consider using "hello".concat( "world!" ) for small number of strings in concatenation. It could be even better for performance than other approaches.

If you have more than 3 strings, than consider using StringBuilder, or just String, depending on compiler that you use.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文