String.Join 与 StringBuilder:哪个更快?

发布于 2024-07-14 02:47:37 字数 341 浏览 4 评论 0原文

在关于格式化 double 的上一个问题中][] 为 CSV 格式,它有人建议使用StringBuilder会比String.Join更快。 这是真的?

In a previous question about formatting a double[][] to CSV format, it was suggested that using StringBuilder would be faster than String.Join. Is this true?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

离旧人 2024-07-21 02:47:38

简短的回答:这取决于。

长答案:如果您已经有一个要连接在一起的字符串数组(带有分隔符),String.Join 是最快的方法。

String。 Join 可以查看所有字符串以计算出所需的确切长度,然后再次复制所有数据。 这意味着不会涉及额外的复制。 唯一的缺点是它必须遍历字符串两次,这意味着可能会超出必要的次数来破坏内存缓存。

如果您事先没有将字符串作为数组,那么使用StringBuilder可能会更快 - 但也有可能不是这样的情况t。 如果使用 StringBuilder 意味着进行大量复制,那么构建一个数组然后调用 String.Join 可能会更快。

编辑:这是对 String.Join 的单次调用与对 StringBuilder.Append 的一堆调用。 在最初的问题中,我们有两个不同级别的 String.Join 调用,因此每个嵌套调用都会创建一个中间字符串。 换句话说,它更加复杂,更加难以猜测。 我会惊讶地看到任何一种方式在典型数据上都显着“获胜”(就复杂性而言)。

编辑:当我在家时,我会编写一个对于 StringBuilder 来说尽可能痛苦的基准测试。 基本上,如果您有一个数组,其中每个元素的大小约为前一个元素的两倍,并且您得到了正确的结果,那么您应该能够强制为每个追加(元素的副本,而不是分隔符的副本,尽管这需要也要考虑在内)。 到那时,它几乎和简单的字符串连接一样糟糕 - 但 String.Join 不会有任何问题。

Short answer: it depends.

Long answer: if you already have an array of strings to concatenate together (with a delimiter), String.Join is the fastest way of doing it.

String.Join can look through all of the strings to work out the exact length it needs, then go again and copy all the data. This means there will be no extra copying involved. The only downside is that it has to go through the strings twice, which means potentially blowing the memory cache more times than necessary.

If you don't have the strings as an array beforehand, it's probably faster to use StringBuilder - but there will be situations where it isn't. If using a StringBuilder means doing lots and lots of copies, then building an array and then calling String.Join may well be faster.

EDIT: This is in terms of a single call to String.Join vs a bunch of calls to StringBuilder.Append. In the original question, we had two different levels of String.Join calls, so each of the nested calls would have created an intermediate string. In other words, it's even more complex and harder to guess about. I would be surprised to see either way "win" significantly (in complexity terms) with typical data.

EDIT: When I'm at home, I'll write up a benchmark which is as painful as possibly for StringBuilder. Basically if you have an array where each element is about twice the size of the previous one, and you get it just right, you should be able to force a copy for every append (of elements, not of the delimiter, although that needs to be taken into account too). At that point it's nearly as bad as simple string concatenation - but String.Join will have no problems.

笑红尘 2024-07-21 02:47:38

这是我的测试设备,为简单起见,使用 int[][] ; 结果优先:(

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

更新结果:)

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

(更新re 2048 * 64 * 150)

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

并且启用OptimizeForTesting:

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

速度更快,但速度不是很快; 装备(在控制台、发布模式等下运行):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}

Here's my test rig, using int[][] for simplicity; results first:

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

(update for double results:)

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

(update re 2048 * 64 * 150)

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

and with OptimizeForTesting enabled:

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

So faster, but not massively so; rig (run at console, in release mode, etc):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}
○闲身 2024-07-21 02:47:38

我不这么认为。 通过 Reflector 来看,String.Join 的实现看起来非常优化。 它还具有额外的好处,即提前知道要创建的字符串的总大小,因此不需要任何重新分配。

我创建了两个测试方法来比较它们:

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

我运行每个方法 50 次,并传入一个大小为 [2048][64] 的数组。 我对两个数组执行了此操作; 一个用零填充,另一个用随机值填充。 我在我的机器上得到了以下结果(P4 3.0 GHz,单核,无 HT,从 CMD 运行发布模式):

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

将数组的大小增加到 [2048][512],同时减少迭代次数达到 10 次后,我得到了以下结果:

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

结果是可重复的(几乎;由不同的随机值引起的小波动)。 显然,String.Join 在大多数情况下要快一些(尽管差距很小)。

这是我用于测试的代码:

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}

I don't think so. Looking through Reflector, the implementation of String.Join looks very optimized. It also has the added benefit of knowing the total size of the string to be created in advance, so it doesn't need any reallocation.

I have created two test methods to compare them:

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

I ran each method 50 times, passing in an array of size [2048][64]. I did this for two arrays; one filled with zeros and another filled with random values. I got the following results on my machine (P4 3.0 GHz, single-core, no HT, running Release mode from CMD):

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

Increasing the size of the array to [2048][512], while decreasing the number of iterations to 10 got me the following results:

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

The results are repeatable (almost; with small fluctuations caused by different random values). Apparently String.Join is a little faster most of the time (although by a very small margin).

This is the code I used for testing:

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}
鸢与 2024-07-21 02:47:38

除非 1% 的差异对于整个程序的运行时间而言非常显着,否则这看起来就像是微优化。 我会编写最具可读性/最容易理解的代码,而不用担心 1% 的性能差异。

Unless the 1% difference turns into something significant in terms of the time the entire program takes to run, this looks like micro-optimization. I'd write the code that's the most readable/understandable and not worry about the 1% performance difference.

淤浪 2024-07-21 02:47:38

是的。 如果您执行多次连接,速度会快很多

当您执行 string.join 时,运行时必须:

  1. 为结果字符串分配内存
  2. 将第一个字符串的内容复制到输出字符串的开头
  3. 将第二个字符串的内容复制到输出字符串的结尾。

如果进行两次联接,则必须复制数据两次,依此类推。

StringBuilder 分配一个缓冲区以供备用,因此可以附加数据而无需复制原始字符串。 由于缓冲区中还有剩余空间,因此可以将附加的字符串直接写入缓冲区中。
然后它只需要在最后复制整个字符串一次。

yes. If you do more than a couple of joins, it will be a lot faster.

When you do a string.join, the runtime has to:

  1. Allocate memory for the resulting string
  2. copy the contents of the first string to the beginning of the output string
  3. copy the contents of the second string to the end of the output string.

If you do two joins, it has to copy the data twice, and so on.

StringBuilder allocates one buffer with space to spare, so data can be appended without having to copy the original string. As there is space left over in the buffer, the appended string can be written into the buffer directly.
Then it just has to copy the entire string once, at the end.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文