是什么原因导致C#中不同数组长度的不同复制方法存在差异?
在 C# 中,有一些不同的方法可以将数组的元素复制到另一个数组。据我所知,它们是“For”循环,Array.CopyTo
,Span
,T[].CopyTo
> 和Buffer.BlockCopy
。
由于循环复制元素始终是最慢的方法,因此我跳过它并为其他四种方法运行基准测试。然而,它们的速度似乎与数组的长度有关,这让我很困惑。
我的基准测试代码如下所示。我的实验环境是Windows 11,.NET 6,Intel 12700 CPU,64位,使用“BenchmarkDotnet”作为基准测试框架。
public class UnitTest1
{
static readonly int times = 1000;
static readonly int arrayLength = 8;
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
public static int[] GetRandomArray(int length)
{
int[] array = new int[length];
for (int i = 0; i < length; i++)
{
array[i] = new Random(DateTime.Now.Millisecond).Next(int.MinValue, int.MaxValue);
}
System.Threading.Thread.Sleep(2000);
return array;
}
[Benchmark]
public void TestArrayCopy()
{
for (var j = 0; j < times; j++)
{
src.CopyTo(dst, 0);
}
}
[Benchmark]
public void TestSingleSpanCopy()
{
var dstSpan = dst.AsSpan();
for (var j = 0; j < times; j++)
{
src.CopyTo(dstSpan);
}
}
[Benchmark]
public void TestDoubleSpanCopy()
{
var srcSpan = src.AsSpan();
var dstSpan = dst.AsSpan();
for (var j = 0; j < times; j++)
{
srcSpan.CopyTo(dstSpan);
}
}
[Benchmark]
public void BufferCopy()
{
for (var j = 0; j < times; j++)
{
System.Buffer.BlockCopy(src, 0, dst, 0, sizeof(int) * src.Length);
}
}
}
这是测试结果。
times = 1000, arrayLength = 8
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 3.061 us | 0.0370 us | 0.0543 us |
| TestSingleSpanCopy | 1.297 us | 0.0041 us | 0.0038 us |
| TestDoubleSpanCopy | 1.113 us | 0.0190 us | 0.0203 us |
| BufferCopy | 7.162 us | 0.1250 us | 0.1044 us |
times = 1000, arrayLength = 16
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 3.426 us | 0.0677 us | 0.0806 us |
| TestSingleSpanCopy | 1.609 us | 0.0264 us | 0.0206 us |
| TestDoubleSpanCopy | 1.478 us | 0.0228 us | 0.0202 us |
| BufferCopy | 7.465 us | 0.0866 us | 0.0723 us |
times
| Method | Mean | Error | StdDev | Median |
|------------------- |----------:|----------:|----------:|----------:|
| TestArrayCopy | 4.063 us | 0.0417 us | 0.0390 us | 4.076 us |
| TestSingleSpanCopy | 4.115 us | 0.3552 us | 1.0473 us | 4.334 us |
| TestDoubleSpanCopy | 3.576 us | 0.3391 us | 0.9998 us | 3.601 us |
| BufferCopy | 12.922 us | 0.7339 us | 2.1640 us | 13.814 us |
= 1000, arrayLength = 32 times = 1000, arrayLength = 128
| Method | Mean | Error | StdDev | Median |
|------------------- |----------:|----------:|----------:|----------:|
| TestArrayCopy | 7.865 us | 0.0919 us | 0.0815 us | 7.842 us |
| TestSingleSpanCopy | 7.036 us | 0.2694 us | 0.7900 us | 7.256 us |
| TestDoubleSpanCopy | 7.351 us | 0.0914 us | 0.0855 us | 7.382 us |
| BufferCopy | 10.955 us | 0.1157 us | 0.1083 us | 10.947 us |
times = 1000, arrayLength = 1024
| Method | Mean | Error | StdDev | Median |
|------------------- |---------:|---------:|----------:|---------:|
| TestArrayCopy | 45.16 us | 3.619 us | 10.670 us | 48.95 us |
| TestSingleSpanCopy | 36.85 us | 3.608 us | 10.638 us | 34.77 us |
| TestDoubleSpanCopy | 38.88 us | 3.378 us | 9.960 us | 39.91 us |
| BufferCopy | 48.83 us | 4.352 us | 12.833 us | 53.65 us |
times = 1000, arrayLength = 16384
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 1.417 ms | 0.1096 ms | 0.3233 ms |
| TestSingleSpanCopy | 1.487 ms | 0.1012 ms | 0.2983 ms |
| TestDoubleSpanCopy | 1.438 ms | 0.1115 ms | 0.3287 ms |
| BufferCopy | 1.423 ms | 0.1147 ms | 0.3383 ms |
times = 100, arrayLength = 65536
| Method | Mean | Error | StdDev |
|------------------- |---------:|---------:|----------:|
| TestArrayCopy | 630.9 us | 47.01 us | 138.61 us |
| TestSingleSpanCopy | 629.5 us | 46.83 us | 138.08 us |
| TestDoubleSpanCopy | 655.4 us | 47.23 us | 139.25 us |
| BufferCopy | 419.0 us | 3.31 us | 2.93 us |
当arrayLength 为 8 或 16, Span
是最快的。当arrayLength为32或128时,前三种方式几乎相同,并且都比Buffer.BlockCopy更快。但是当arrayLength为1024时,SpanT[].CopyTo
再次比其他两种方式更快。当arrayLength为16384时,这四种方式几乎是一样的。但是当 arrayLength 为 65536 时,Buffer.BlockCopy 是最快的!另外,这里的Span
比前两种方式要慢一些。
我实在无法理解结果。起初我猜想CPU缓存很重要。然而,我的CPU的L1 Cache是960KB,这比任何测试用例的数组空间都大。也许这是导致这种情况的不同实现?
如果您愿意为我解释或与我讨论,我将不胜感激。如果我有想法,我也会考虑并更新问题。
正如@Ralf 提到的,每次数组的源和目标都是相同的,这可能会影响结果。我修改了代码并再次尝试测试,如下所示。为了避免浪费时间,我只是每次声明一个新数组,而不是手动随机化它。
using System.Buffers;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run(typeof(Program).Assembly);
Console.WriteLine(summary);
}
}
public class UnitTest1
{
static readonly int times = 1000;
static readonly int arrayLength = 8;
public static int[] GetRandomArray(int length)
{
int[] array = new int[length];
//for (int i = 0; i < length; i++)
//{
// array[i] = new Random(DateTime.Now.Millisecond).Next(int.MinValue, int.MaxValue);
//}
return array;
}
[Benchmark]
public void ArrayCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.CopyTo(dst, 0);
}
}
[Benchmark]
public void SingleSpanCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.CopyTo(dst.AsSpan());
}
}
[Benchmark]
public void DoubleSpanCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.AsSpan().CopyTo(dst.AsSpan());
}
}
[Benchmark]
public void BufferCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
System.Buffer.BlockCopy(src, 0, dst, 0, sizeof(int) * src.Length);
}
}
}
次 = 1000,arrayLength = 8
| Method | Mean | Error | StdDev | Median |
|--------------- |----------:|----------:|----------:|----------:|
| ArrayCopy | 8.843 us | 0.1762 us | 0.3040 us | 8.843 us |
| SingleSpanCopy | 6.864 us | 0.1366 us | 0.1519 us | 6.880 us |
| DoubleSpanCopy | 10.543 us | 0.9496 us | 2.7999 us | 10.689 us |
| BufferCopy | 21.270 us | 1.3477 us | 3.9738 us | 22.630 us |
次 = 1000,arrayLength = 16
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 16.94 us | 0.952 us | 2.808 us | 17.27 us |
| SingleSpanCopy | 12.54 us | 1.054 us | 3.109 us | 12.32 us |
| DoubleSpanCopy | 13.23 us | 0.930 us | 2.741 us | 13.25 us |
| BufferCopy | 23.43 us | 1.218 us | 3.591 us | 24.99 us |
次 = 1000,arrayLength = 32
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 24.35 us | 1.774 us | 5.229 us | 26.23 us |
| SingleSpanCopy | 20.64 us | 1.726 us | 5.089 us | 21.09 us |
| DoubleSpanCopy | 19.97 us | 1.915 us | 5.646 us | 20.08 us |
| BufferCopy | 26.24 us | 2.547 us | 7.511 us | 24.59 us |
次 = 1000,arrayLength = 128
| Method | Mean | Error | StdDev |
|--------------- |---------:|---------:|---------:|
| ArrayCopy | 39.11 us | 0.529 us | 0.495 us |
| SingleSpanCopy | 39.14 us | 0.782 us | 1.070 us |
| DoubleSpanCopy | 40.24 us | 0.798 us | 1.398 us |
| BufferCopy | 42.20 us | 0.480 us | 0.426 us |
次 = 1000,arrayLength = 1024
| Method | Mean | Error | StdDev |
|--------------- |---------:|--------:|--------:|
| ArrayCopy | 254.6 us | 4.92 us | 8.87 us |
| SingleSpanCopy | 241.4 us | 2.98 us | 2.78 us |
| DoubleSpanCopy | 243.7 us | 4.75 us | 4.66 us |
| BufferCopy | 243.0 us | 2.85 us | 2.66 us |
次 = 1000,arayLength = 16384
| Method | Mean | Error | StdDev |
|--------------- |---------:|----------:|----------:|
| ArrayCopy | 4.325 ms | 0.0268 ms | 0.0250 ms |
| SingleSpanCopy | 4.300 ms | 0.0120 ms | 0.0112 ms |
| DoubleSpanCopy | 4.307 ms | 0.0348 ms | 0.0325 ms |
| BufferCopy | 4.293 ms | 0.0238 ms | 0.0222 ms |
次 = 100,arrayLength = 65536
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 153.6 ms | 1.46 ms | 1.29 ms | 153.1 ms |
| SingleSpanCopy | 213.4 ms | 8.78 ms | 25.87 ms | 218.2 ms |
| DoubleSpanCopy | 221.2 ms | 9.51 ms | 28.04 ms | 229.7 ms |
| BufferCopy | 203.1 ms | 10.92 ms | 32.18 ms | 205.6 ms |
@拉尔夫是对的,确实有一些差异。最重要的是,当 arrayLength = 65536 时,Array.Copy
而不是 Buffer.BlockCopy
速度最快。
但结果仍然非常令人困惑..
In C#, there are some different ways to copy the elements of an array to another. To the best of my knowledge, they are "For" loop, Array.CopyTo
, Span<T>.CopyTo
, T[].CopyTo
and Buffer.BlockCopy
.
Since looping to copy the elements is always the slowest way, I skip it and run benchmark test for the other four methods. However, it seems that the speed of them are related with the length of the array, which really confused me.
My code of benchmark test is shown below. My experiment environment is Windows 11, .NET 6, Intel 12700 CPU, 64bits, using "BenchmarkDotnet" as the benchmark test framework.
public class UnitTest1
{
static readonly int times = 1000;
static readonly int arrayLength = 8;
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
public static int[] GetRandomArray(int length)
{
int[] array = new int[length];
for (int i = 0; i < length; i++)
{
array[i] = new Random(DateTime.Now.Millisecond).Next(int.MinValue, int.MaxValue);
}
System.Threading.Thread.Sleep(2000);
return array;
}
[Benchmark]
public void TestArrayCopy()
{
for (var j = 0; j < times; j++)
{
src.CopyTo(dst, 0);
}
}
[Benchmark]
public void TestSingleSpanCopy()
{
var dstSpan = dst.AsSpan();
for (var j = 0; j < times; j++)
{
src.CopyTo(dstSpan);
}
}
[Benchmark]
public void TestDoubleSpanCopy()
{
var srcSpan = src.AsSpan();
var dstSpan = dst.AsSpan();
for (var j = 0; j < times; j++)
{
srcSpan.CopyTo(dstSpan);
}
}
[Benchmark]
public void BufferCopy()
{
for (var j = 0; j < times; j++)
{
System.Buffer.BlockCopy(src, 0, dst, 0, sizeof(int) * src.Length);
}
}
}
Here are the test results.
times = 1000, arrayLength = 8
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 3.061 us | 0.0370 us | 0.0543 us |
| TestSingleSpanCopy | 1.297 us | 0.0041 us | 0.0038 us |
| TestDoubleSpanCopy | 1.113 us | 0.0190 us | 0.0203 us |
| BufferCopy | 7.162 us | 0.1250 us | 0.1044 us |
times = 1000, arrayLength = 16
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 3.426 us | 0.0677 us | 0.0806 us |
| TestSingleSpanCopy | 1.609 us | 0.0264 us | 0.0206 us |
| TestDoubleSpanCopy | 1.478 us | 0.0228 us | 0.0202 us |
| BufferCopy | 7.465 us | 0.0866 us | 0.0723 us |
times = 1000, arrayLength = 32
| Method | Mean | Error | StdDev | Median |
|------------------- |----------:|----------:|----------:|----------:|
| TestArrayCopy | 4.063 us | 0.0417 us | 0.0390 us | 4.076 us |
| TestSingleSpanCopy | 4.115 us | 0.3552 us | 1.0473 us | 4.334 us |
| TestDoubleSpanCopy | 3.576 us | 0.3391 us | 0.9998 us | 3.601 us |
| BufferCopy | 12.922 us | 0.7339 us | 2.1640 us | 13.814 us |
times = 1000, arrayLength = 128
| Method | Mean | Error | StdDev | Median |
|------------------- |----------:|----------:|----------:|----------:|
| TestArrayCopy | 7.865 us | 0.0919 us | 0.0815 us | 7.842 us |
| TestSingleSpanCopy | 7.036 us | 0.2694 us | 0.7900 us | 7.256 us |
| TestDoubleSpanCopy | 7.351 us | 0.0914 us | 0.0855 us | 7.382 us |
| BufferCopy | 10.955 us | 0.1157 us | 0.1083 us | 10.947 us |
times = 1000, arrayLength = 1024
| Method | Mean | Error | StdDev | Median |
|------------------- |---------:|---------:|----------:|---------:|
| TestArrayCopy | 45.16 us | 3.619 us | 10.670 us | 48.95 us |
| TestSingleSpanCopy | 36.85 us | 3.608 us | 10.638 us | 34.77 us |
| TestDoubleSpanCopy | 38.88 us | 3.378 us | 9.960 us | 39.91 us |
| BufferCopy | 48.83 us | 4.352 us | 12.833 us | 53.65 us |
times = 1000, arrayLength = 16384
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 1.417 ms | 0.1096 ms | 0.3233 ms |
| TestSingleSpanCopy | 1.487 ms | 0.1012 ms | 0.2983 ms |
| TestDoubleSpanCopy | 1.438 ms | 0.1115 ms | 0.3287 ms |
| BufferCopy | 1.423 ms | 0.1147 ms | 0.3383 ms |
times = 100, arrayLength = 65536
| Method | Mean | Error | StdDev |
|------------------- |---------:|---------:|----------:|
| TestArrayCopy | 630.9 us | 47.01 us | 138.61 us |
| TestSingleSpanCopy | 629.5 us | 46.83 us | 138.08 us |
| TestDoubleSpanCopy | 655.4 us | 47.23 us | 139.25 us |
| BufferCopy | 419.0 us | 3.31 us | 2.93 us |
When the arrayLength is 8 or 16, the Span<T>.CopyTo()
is the fastest. When the arrayLength is 32 or 128, the first three way are almost the same and all faster than Buffer.BlockCopy
.Ehen the arrayLength is 1024, however, the Span<T>.CopyTo
and T[].CopyTo
are again faster than the other two ways. When the arrayLength is 16384, these four ways are almost the same. But when the arrayLength is 65536, the Buffer.BlockCopy
is the fastest! Besides, the Span<T>.CopyTo
here is a bit slower than the first two ways.
I really can't understand the results. At first I guess it's the cpu cache that matters. However, the L1 Cache of my CPU is 960KB, which is larger than the space of the array of any test case. Maybe it's the different implementation that causes this?
I will appreciate it if you are willing to explain it for me or discuss with me. I will also think about it and update the question if I get an idea.
As @Ralf mentioned, the source and destination of the array in each time are all the same, which could impact on the results. I modified my code and tried the test again, as is shown below. To avoid the time consume, I just declare a new array each time instead of randomize it manually.
using System.Buffers;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run(typeof(Program).Assembly);
Console.WriteLine(summary);
}
}
public class UnitTest1
{
static readonly int times = 1000;
static readonly int arrayLength = 8;
public static int[] GetRandomArray(int length)
{
int[] array = new int[length];
//for (int i = 0; i < length; i++)
//{
// array[i] = new Random(DateTime.Now.Millisecond).Next(int.MinValue, int.MaxValue);
//}
return array;
}
[Benchmark]
public void ArrayCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.CopyTo(dst, 0);
}
}
[Benchmark]
public void SingleSpanCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.CopyTo(dst.AsSpan());
}
}
[Benchmark]
public void DoubleSpanCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.AsSpan().CopyTo(dst.AsSpan());
}
}
[Benchmark]
public void BufferCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
System.Buffer.BlockCopy(src, 0, dst, 0, sizeof(int) * src.Length);
}
}
}
times = 1000, arrayLength = 8
| Method | Mean | Error | StdDev | Median |
|--------------- |----------:|----------:|----------:|----------:|
| ArrayCopy | 8.843 us | 0.1762 us | 0.3040 us | 8.843 us |
| SingleSpanCopy | 6.864 us | 0.1366 us | 0.1519 us | 6.880 us |
| DoubleSpanCopy | 10.543 us | 0.9496 us | 2.7999 us | 10.689 us |
| BufferCopy | 21.270 us | 1.3477 us | 3.9738 us | 22.630 us |
times = 1000, arrayLength = 16
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 16.94 us | 0.952 us | 2.808 us | 17.27 us |
| SingleSpanCopy | 12.54 us | 1.054 us | 3.109 us | 12.32 us |
| DoubleSpanCopy | 13.23 us | 0.930 us | 2.741 us | 13.25 us |
| BufferCopy | 23.43 us | 1.218 us | 3.591 us | 24.99 us |
times = 1000, arrayLength = 32
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 24.35 us | 1.774 us | 5.229 us | 26.23 us |
| SingleSpanCopy | 20.64 us | 1.726 us | 5.089 us | 21.09 us |
| DoubleSpanCopy | 19.97 us | 1.915 us | 5.646 us | 20.08 us |
| BufferCopy | 26.24 us | 2.547 us | 7.511 us | 24.59 us |
times = 1000, arrayLength = 128
| Method | Mean | Error | StdDev |
|--------------- |---------:|---------:|---------:|
| ArrayCopy | 39.11 us | 0.529 us | 0.495 us |
| SingleSpanCopy | 39.14 us | 0.782 us | 1.070 us |
| DoubleSpanCopy | 40.24 us | 0.798 us | 1.398 us |
| BufferCopy | 42.20 us | 0.480 us | 0.426 us |
times = 1000, arrayLength = 1024
| Method | Mean | Error | StdDev |
|--------------- |---------:|--------:|--------:|
| ArrayCopy | 254.6 us | 4.92 us | 8.87 us |
| SingleSpanCopy | 241.4 us | 2.98 us | 2.78 us |
| DoubleSpanCopy | 243.7 us | 4.75 us | 4.66 us |
| BufferCopy | 243.0 us | 2.85 us | 2.66 us |
times = 1000, arayLength = 16384
| Method | Mean | Error | StdDev |
|--------------- |---------:|----------:|----------:|
| ArrayCopy | 4.325 ms | 0.0268 ms | 0.0250 ms |
| SingleSpanCopy | 4.300 ms | 0.0120 ms | 0.0112 ms |
| DoubleSpanCopy | 4.307 ms | 0.0348 ms | 0.0325 ms |
| BufferCopy | 4.293 ms | 0.0238 ms | 0.0222 ms |
times = 100, arrayLength = 65536
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 153.6 ms | 1.46 ms | 1.29 ms | 153.1 ms |
| SingleSpanCopy | 213.4 ms | 8.78 ms | 25.87 ms | 218.2 ms |
| DoubleSpanCopy | 221.2 ms | 9.51 ms | 28.04 ms | 229.7 ms |
| BufferCopy | 203.1 ms | 10.92 ms | 32.18 ms | 205.6 ms |
@Ralf is right, there is indeed some differences. The most significant one is that when arrayLength = 65536, Array.Copy
instead of Buffer.BlockCopy
is the fastest.
But still, the results are very confusing..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您确定可以重复相同的基准测试并获得相同的结果吗?也许这只是一次性事件,可能是由于热量问题或另一个应用程序占用处理器时间引起的。当我在我的机器上尝试时,我得到的值更符合您的期望。
由于某种原因它说 Windows 10,我也在使用 11。
就在发布此内容之前,我意识到:您的 CPU 12700 具有性能和效率核心。如果它在效率核心上运行大部分基准测试并且恰好在性能核心上运行 BufferCopy 部分会怎样?您可以尝试在 BIOS 中禁用效率核心吗?
Are you sure you can repeat the same benchmark and get the same results? Perhaps it was just a one time occurence, maybe caused by heat issues or another app taking processor time. When I try it on my machine, the values I get are more in line with what you'd expect.
It says Windows 10 for some reason, I'm using 11 too.
Just before posting this, I realized: Your CPU, 12700, has performance and efficiency cores. What if it ran most of the benchmark on efficiency cores and just so happened to run the BufferCopy part on performance cores? Can you try disabling your efficiency cores in BIOS?