当前位置：文江博客话题详情

String.Join 与 StringBuilder：哪个更快？

发布于 2024-07-14 02:47:37 字数 341 浏览 4 评论 0原文

在关于格式化 double 的上一个问题中][] 为 CSV 格式，它有人建议使用StringBuilder会比String.Join更快。这是真的？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

离旧人 2024-07-21 02:47:38

简短的回答：这取决于。

长答案：如果您已经有一个要连接在一起的字符串数组（带有分隔符），String.Join 是最快的方法。

String。 Join 可以查看所有字符串以计算出所需的确切长度，然后再次复制所有数据。这意味着不会涉及额外的复制。唯一的缺点是它必须遍历字符串两次，这意味着可能会超出必要的次数来破坏内存缓存。

如果您事先没有将字符串作为数组，那么使用StringBuilder可能会更快 - 但也有可能不是这样的情况t。如果使用 StringBuilder 意味着进行大量复制，那么构建一个数组然后调用 String.Join 可能会更快。

编辑：这是对 String.Join 的单次调用与对 StringBuilder.Append 的一堆调用。在最初的问题中，我们有两个不同级别的 String.Join 调用，因此每个嵌套调用都会创建一个中间字符串。换句话说，它更加复杂，更加难以猜测。我会惊讶地看到任何一种方式在典型数据上都显着“获胜”（就复杂性而言）。

编辑：当我在家时，我会编写一个对于 StringBuilder 来说尽可能痛苦的基准测试。基本上，如果您有一个数组，其中每个元素的大小约为前一个元素的两倍，并且您得到了正确的结果，那么您应该能够强制为每个追加（元素的副本，而不是分隔符的副本，尽管这需要也要考虑在内）。到那时，它几乎和简单的字符串连接一样糟糕 - 但 String.Join 不会有任何问题。

回复收藏 0 原文

笑红尘 2024-07-21 02:47:38

这是我的测试设备，为简单起见，使用 int[][] ；结果优先：（

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

更新双结果：）

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

（更新re 2048 * 64 * 150）

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

并且启用OptimizeForTesting：

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

速度更快，但速度不是很快；装备（在控制台、发布模式等下运行）：

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}

Here's my test rig, using int[][] for simplicity; results first:

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

(update for double results:)

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

(update re 2048 * 64 * 150)

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

and with OptimizeForTesting enabled:

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

So faster, but not massively so; rig (run at console, in release mode, etc):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}

回复收藏 0 原文

○闲身 2024-07-21 02:47:38

我不这么认为。通过 Reflector 来看，String.Join 的实现看起来非常优化。它还具有额外的好处，即提前知道要创建的字符串的总大小，因此不需要任何重新分配。

我创建了两个测试方法来比较它们：

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

我运行每个方法 50 次，并传入一个大小为 [2048][64] 的数组。我对两个数组执行了此操作；一个用零填充，另一个用随机值填充。我在我的机器上得到了以下结果（P4 3.0 GHz，单核，无 HT，从 CMD 运行发布模式）：

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

将数组的大小增加到 [2048][512]，同时减少迭代次数达到 10 次后，我得到了以下结果：

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

结果是可重复的（几乎；由不同的随机值引起的小波动）。显然，String.Join 在大多数情况下要快一些（尽管差距很小）。

这是我用于测试的代码：

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}

I don't think so. Looking through Reflector, the implementation of String.Join looks very optimized. It also has the added benefit of knowing the total size of the string to be created in advance, so it doesn't need any reallocation.

I have created two test methods to compare them:

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

I ran each method 50 times, passing in an array of size [2048][64]. I did this for two arrays; one filled with zeros and another filled with random values. I got the following results on my machine (P4 3.0 GHz, single-core, no HT, running Release mode from CMD):

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

Increasing the size of the array to [2048][512], while decreasing the number of iterations to 10 got me the following results:

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

The results are repeatable (almost; with small fluctuations caused by different random values). Apparently String.Join is a little faster most of the time (although by a very small margin).

This is the code I used for testing:

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}

回复收藏 0 原文