将字节数组转换为 int 的更快方法

发布于 2024-10-05 09:21:34 字数 185 浏览 4 评论 0原文

有没有比 < 更快的方法code>BitConverter.ToInt32 将字节数组转换为 int 值?

Is there a faster way than BitConverter.ToInt32 to convert a byte array to an int value?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

北斗星光 2024-10-12 09:21:34

我实际上尝试了几种不同的方法将四个字节转换为 int:

  1. BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
  2. BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
  3. b = new byte[] { w, x, y, z };
    BitConverter.ToInt32(b, 0);
  4. b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z };
    BitConverter.ToInt32(b, 7);
  5. w | (x << 8) | (y << 16) | (y << 16) | (z << 24);
  6. b[0] | (b[1] << 8) | (b[2] << 16) | (b[2] << 16) | (b[3] << 24);

我在发布 (x86) 版本中运行了 10^9 次迭代,而不是在 2.5 GHz Core i7 笔记本电脑。以下是我的结果(请注意,不使用 BitConverter 的方法要快得多):

test1: 00:00:15.5287282 67305985
test2: 00:00:15.1334457 67305985
test3: 00:00:08.0648586 67305985
test4: 00:00:11.2307059 67305985
test5: 00:00:02.0219417 67305985
test6: 00:00:01.6275684 67305985

您可以得出一些结论:

  • test1 显示在我的笔记本电脑上很难使转换速度慢于 15ns,我不想说这对任何人来说都应该足够快。 (每秒需要调用它超过 60M 次吗?)
  • test2 表明使用 uint 代替 int 可以节省少量时间。我不知道为什么,但我认为它小到足以成为实验误差。
  • test3 显示创建新字节数组的开销 (7ns) 与调用该函数几乎一样多,但仍然比从旧数组创建新数组要快。
  • test4 显示从 ToInt32 进行未对齐的数组访问会增加开销 (3ns)
  • test5 显示从局部变量中提取 4 个字节并自行组合它们比调用 ToInt32 快几倍。
  • test6 表明,从数组中提取 4 个字节实际上比从函数参数中提取要快一些!我怀疑这是由于 CPU 流水线或缓存效应造成的。

最快的 test6 的运行时间仅为空循环(未显示)的两倍。换句话说,执行每次转换所需的时间不到 1ns。祝你好运,任何有用的计算都比这更快!

这是我的测试程序:

using System;

namespace BitConverterTest
{
    class Program
    {
        const int iters = 1000000000;
        static void Main(string[] args)
        {
            test1(1, 2, 3, 4);
            test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);
        }

        static void test1(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test1: " + timer.Elapsed + " " + res);
        }

        static void test2(byte w, byte x, byte y, byte z)
        {
            uint res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test2: " + timer.Elapsed + " " + res);
        }

        static void test3(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        static void test4(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 7);
            Console.WriteLine("test4: " + timer.Elapsed + " " + res);
        }

        static void test5(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = w | (x << 8) | (y << 16) | (z << 24);
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        static void test6(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }
    }
}

I actually tried several different ways to convert four bytes to an int:

  1. BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
  2. BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
  3. b = new byte[] { w, x, y, z };
    BitConverter.ToInt32(b, 0);
  4. b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z };
    BitConverter.ToInt32(b, 7);
  5. w | (x << 8) | (y << 16) | (z << 24);
  6. b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);

I ran 10^9 iterations of each one in a Release (x86) build not on under a debugger on a 2.5 GHz Core i7 laptop. Here are my results (note that the methods that don't use BitConverter are substantially faster):

test1: 00:00:15.5287282 67305985
test2: 00:00:15.1334457 67305985
test3: 00:00:08.0648586 67305985
test4: 00:00:11.2307059 67305985
test5: 00:00:02.0219417 67305985
test6: 00:00:01.6275684 67305985

Some conclusions you can draw:

  • test1 shows that on my laptop it's hard to make the conversion go slower than 15ns, which I hate to say should be fast enough for anyone. (Do you need to call it more than 60M times per second?)
  • test2 shows that using uint instead of int saves a small amount of time. I'm not sure why, but I think it's small enough to be experimental error.
  • test3 shows that the overhead of creating a new byte array (7ns) is as nearly as much as calling the function, but is still faster than making a new array out of the old array.
  • test4 shows that making unaligned array accesses from ToInt32 adds overhead (3ns)
  • test5 shows that pulling the 4 bytes from local variables and combining them yourself is several times faster than calling ToInt32.
  • test6 shows that it's actually slightly faster to pull the 4 bytes from an array than from function arguments! I suspect this is due to CPU pipelining or cache effects.

The fastest, test6, took only twice as long to run as an empty loop (not shown). In other words, it took less than 1ns to perform each conversion. Good luck getting any useful calculation to go faster than that!

Here's my test program:

using System;

namespace BitConverterTest
{
    class Program
    {
        const int iters = 1000000000;
        static void Main(string[] args)
        {
            test1(1, 2, 3, 4);
            test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);
        }

        static void test1(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test1: " + timer.Elapsed + " " + res);
        }

        static void test2(byte w, byte x, byte y, byte z)
        {
            uint res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test2: " + timer.Elapsed + " " + res);
        }

        static void test3(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        static void test4(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 7);
            Console.WriteLine("test4: " + timer.Elapsed + " " + res);
        }

        static void test5(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = w | (x << 8) | (y << 16) | (z << 24);
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        static void test6(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }
    }
}
傻比既视感 2024-10-12 09:21:34

如果我没记错的话,该实现使用不安全的代码(将 byte* 视为 int*),因此很难击败,但另一种选择是移位。

然而,从该领域的大量工作来看,这不太可能成为真正的瓶颈,因此是无关紧要的。通常,I/O 是主要问题。

然而,由于数组/堆分配,GetBytes(int) 更昂贵(大量)。

If I remember correctly, that implementation uses unsafe code (treating a byte* as an int*), so it will be hard to beat, but the other alternative is shifting.

However, from lots of work in this area, this is so unlikely to be a genuine bottleneck as to be irrelevant. I/O is the main issue, typically.

GetBytes(int), however, is more expensive (in high volume) due to array / heap allocation.

好听的两个字的网名 2024-10-12 09:21:34

Gabe 的性能测试的后续内容:

更改:

  • 消除测试1和2,因为内联数组创建对GC进行了这些测试(从Gen 0 GC性能计数器可以看出)。
  • 消除测试 4(非对齐数组)以使事情变得更简单。
  • 添加测试 7 和 8,分别通过 BitConverter 和位摆弄从大型数组 (256 MB) 进行转换。
  • 将运行总计添加到测试中,以尝试避免常见的子表达式消除,这显然会导致 Gabe 的测试 5 和 6 中的低时间。

结果:

  • 32 位选项:

    <前><代码>测试3:00:00:06.9230577
    测试5:00:00:03.8349386
    测试6:00:00:03.8238272
    测试7:00:00:07.3898489
    测试8:00:00:04.6807391

  • 64 位选项:

    <前><代码>测试3:00:00:05.8794322
    测试5:00:00:00.4384600
    测试6:00:00:00.4069573
    测试7:00:00:06.2279365
    测试8:00:00:03.5472486

分析

  1. 64 位上的 5 和 6 中仍然会消除常见的子表达式。
  2. 对于这个 64 位来说是一个胜利。但不应遵循这样的微基准来选择优化应用程序的位置。
  3. 将 256 MB 随机数据转换为整数时,看起来大约提高了 50%。由于测试执行了 16 次,不到 0.2 秒 — 在非常狭窄的应用程序子集之外不太可能产生真正的差异,然后您需要额外的维护成本来确保有人不会在应用程序生命周期内破坏代码。
  4. 我想知道 BitConverter 的参数检查开销有多少?
  5. 测试 6 仅比测试 5 快一点。显然,数组边界检查正在被消除。

守则

using System;

namespace BitConverterTest {
    class Program {
        const int iters = 1024*1024*1024;
        const int arrayLen = iters/4;
        static byte[] array = new byte[arrayLen];

        static void Main(string[] args) {
            //test1(1, 2, 3, 4);
            //test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            //test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);

            // Fill array with good PRNG data
            var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
            rng.GetBytes(array);

            test7();
            test8();
        }

        // BitConverter with aligned input
        static void test3(byte w, byte x, byte y, byte z) {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with separate variables.
        static void test5(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = w | (x << 8) | (y << 16) | (z << 24);
                res += a;
            }
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with array elements.
        static void test6(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
                res += a;
            }
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }

        // BitConvert from large array...
        static void test7() {
            var its = iters/arrayLen * 4; // *4 to remove arrayLen/4 factor.
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    var x = BitConverter.ToInt32(array, pos);
                    res += x;
                }
            }
            Console.WriteLine("test7: " + timer.Elapsed + " " + res);
        }

        // Bitfiddle from large array...
        static void test8() {
            var its = iters/arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    int x = array[pos] | (array[pos+1] << 8) | (array[pos+2] << 16) | (array[pos+3] << 24);
                    res += x;
                }
            }
            Console.WriteLine("test8: " + timer.Elapsed + " " + res);
        }
    }
}

Followup to Gabe's performance tests:

Changes:

  • Eliminate tests 1 and 2, because the inline array creation made these tests of the GC (as can be seen from the Gen 0 GC performance counter).
  • Eliminate test 4 (non-aligned array) to keep things simpler.
  • Add tests 7 and 8 which do conversions from a large array (256 MB) via BitConverter and bit fiddling respectively.
  • Add running total to tests to try and avoid common sub-expression elimination, which clearly lead to the low times in Gabe's tests 5 and 6.

Results:

  • 32-bit option:

    test3: 00:00:06.9230577
    test5: 00:00:03.8349386
    test6: 00:00:03.8238272
    test7: 00:00:07.3898489
    test8: 00:00:04.6807391
    
  • 64-bit option:

    test3: 00:00:05.8794322
    test5: 00:00:00.4384600
    test6: 00:00:00.4069573
    test7: 00:00:06.2279365
    test8: 00:00:03.5472486
    

Analysis

  1. Still getting common sub-expression elimination in 5 and 6 on 64-bit.
  2. For this 64 bit is a win. But such a micro-benchmark shouldn't be followed for choosing where to optimise an application.
  3. It looks like about a 50% improvement when converting 256 MB of random data into ints. As the test does it 16 times, that's less that 0.2s—unlikely to make a real difference outside a very narrow subset of applications, and then you have the additional maintenance cost of ensuring that someone doesn't break the code over the application lifetime.
  4. I wonder how much of the BitConverter overhead is the parameter checks it does?
  5. Test 6 is only a little faster than 5. Clearly array bounds checks are being eliminated.

The Code

using System;

namespace BitConverterTest {
    class Program {
        const int iters = 1024*1024*1024;
        const int arrayLen = iters/4;
        static byte[] array = new byte[arrayLen];

        static void Main(string[] args) {
            //test1(1, 2, 3, 4);
            //test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            //test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);

            // Fill array with good PRNG data
            var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
            rng.GetBytes(array);

            test7();
            test8();
        }

        // BitConverter with aligned input
        static void test3(byte w, byte x, byte y, byte z) {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with separate variables.
        static void test5(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = w | (x << 8) | (y << 16) | (z << 24);
                res += a;
            }
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with array elements.
        static void test6(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
                res += a;
            }
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }

        // BitConvert from large array...
        static void test7() {
            var its = iters/arrayLen * 4; // *4 to remove arrayLen/4 factor.
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    var x = BitConverter.ToInt32(array, pos);
                    res += x;
                }
            }
            Console.WriteLine("test7: " + timer.Elapsed + " " + res);
        }

        // Bitfiddle from large array...
        static void test8() {
            var its = iters/arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    int x = array[pos] | (array[pos+1] << 8) | (array[pos+2] << 16) | (array[pos+3] << 24);
                    res += x;
                }
            }
            Console.WriteLine("test8: " + timer.Elapsed + " " + res);
        }
    }
}
一口甜 2024-10-12 09:21:34

基于对 BitConverter.ToInt32 实施的快速回顾 在 .NET Reflector 中我会说“”。

它针对数组对齐并直接转换字节的情况进行优化,否则执行按位合并。

Based on a quick review of the implementation of BitConverter.ToInt32 in .NET Reflector I would say "No".

It optimises for the case where the array is aligned and directly casts the bytes, otherwise it performs a bitwise merge.

空城之時有危險 2024-10-12 09:21:34

我总结了以上所有内容,添加了 Span 变体并使用了基准框架。

public class ByteArrayToIntBench
{
    private readonly byte[] _array = new byte[4 * 10_000];

    public ByteArrayToIntBench()
    {
        Random r = new Random();
        for (int i = 0; i < _array.Length; i++)
        {
            _array[i] = (byte)r.Next(byte.MinValue, byte.MaxValue);
        }
    }

    [Benchmark]
    public double Bitconverter()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            res += BitConverter.ToInt32(_array, i);
        }
        return res;
    }

    [Benchmark]
    public unsafe double Unsafe()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            fixed (byte* pData = &_array[i])
            {
                res += *(int*)pData;
            }
        }
        return res;
    }

    [Benchmark]
    public double Shift()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            res += _array[i] | (_array[i + 1] << 8) | (_array[i + 2] << 16) | (_array[i + 3] << 24);
        }
        return res;
    }

    [Benchmark]
    public double Span()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            res += MemoryMarshal.Cast<byte, int>(_array.AsSpan(i, 4))[0];
        }
        return res;
    }
}

结果

I summarized all above, added a Span variant and used a benchmark framework.

public class ByteArrayToIntBench
{
    private readonly byte[] _array = new byte[4 * 10_000];

    public ByteArrayToIntBench()
    {
        Random r = new Random();
        for (int i = 0; i < _array.Length; i++)
        {
            _array[i] = (byte)r.Next(byte.MinValue, byte.MaxValue);
        }
    }

    [Benchmark]
    public double Bitconverter()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            res += BitConverter.ToInt32(_array, i);
        }
        return res;
    }

    [Benchmark]
    public unsafe double Unsafe()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            fixed (byte* pData = &_array[i])
            {
                res += *(int*)pData;
            }
        }
        return res;
    }

    [Benchmark]
    public double Shift()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            res += _array[i] | (_array[i + 1] << 8) | (_array[i + 2] << 16) | (_array[i + 3] << 24);
        }
        return res;
    }

    [Benchmark]
    public double Span()
    {
        double res = 0;
        for (int i = 0; i < _array.Length; i += 4)
        {
            res += MemoryMarshal.Cast<byte, int>(_array.AsSpan(i, 4))[0];
        }
        return res;
    }
}

Results

我的鱼塘能养鲲 2024-10-12 09:21:34

我也曾摆弄过类似的问题。

就我而言,当数据存储为双精度 byte[] 或仅在 double 之间时,如何转换为单精度 float > 表示和 byte[] 表示等。如果想要在大量数据上获得最佳性能,最好不要经过太多 API 层,并嵌入尽可能多的信息可以尽可能地融入到算法中,而不会使它变得太脆弱或难以理解。

因此,为了进一步跟进 Richard 的 测试,我在下面添加了另一个测试 (test9),其中这是我在自己的工作中采用的方式,并在他的分析部分回答了他的第 4 点:

使用不安全的内存指针访问来实现最高性能的结果。当然如果你使用c++,但不一定是c#。这类似于 BitConverter 在幕后所做的事情,但没有参数和安全检查(因为,当然,我们知道我们在做什么......;)

结果:

  • 32 位选项:

    <前><代码>测试3:00:00:06.2373138
    测试5:00:00:03.1193338
    测试6:00:00:03.1609287
    测试7:00:00:07.7328020
    测试8:00:00:06.4192130
    测试9:00:00:03.9590307

  • 64 位选项:

    <预><代码>测试3:00:00:06.2209098
    测试5:00:00:00.5563930
    测试6:00:00:01.5486780
    测试7:00:00:08.4858474
    测试8:00:00:05.4991740
    测试9:00:00:02.2928944

这里是相同的代码,包括新的 test9

using System;

namespace BitConverterTest
{
    class Program
    {
        const int iters = 1024 * 1024 * 1024;
        const int arrayLen = iters / 4;
        static byte[] array = new byte[arrayLen];

        static void Main(string[] args)
        {
            //test1(1, 2, 3, 4);
            //test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            //test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);

            // Fill array with good PRNG data
            var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
            rng.GetBytes(array);

            test7();
            test8();
            test9();
        }

        // BitConverter with aligned input
        static void test3(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with separate variables.
        static void test5(byte w, byte x, byte y, byte z)
        {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
            {
                int a = w | (x << 8) | (y << 16) | (z << 24);
                res += a;
            }
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with array elements.
        static void test6(byte w, byte x, byte y, byte z)
        {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
            {
                int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
                res += a;
            }
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }

        // BitConvert from large array...
        static void test7()
        {
            var its = iters / arrayLen * 4; // *4 to remove arrayLen/4 factor.
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    var x = BitConverter.ToInt32(array, pos);
                    res += x;
                }
            }
            Console.WriteLine("test7: " + timer.Elapsed + " " + res);
        }

        // Bitfiddle from large array...
        static void test8()
        {
            var its = iters / arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    int x = array[pos] | (array[pos + 1] << 8) | (array[pos + 2] << 16) | (array[pos + 3] << 24);
                    res += x;
                }
            }
            Console.WriteLine("test8: " + timer.Elapsed + " " + res);
        }

        // unsafe memory operations from large array...
        // (essentialy internals of BitConverter without param checks, etc)
        static unsafe void test9()
        {
            var its = iters / arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            int value = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    fixed (byte* numPtr = &array[pos])
                    {
                        value = *(int*)numPtr;
                    }
                    int x = *(int*)&value;
                    res += x;
                }
            }
            Console.WriteLine("test9: " + timer.Elapsed + " " + res);
        }

    }
}

I have also fiddled with similar issues.

In my case it was how to convert to single precision floats when data is stored as double precision byte[]s, or just between the double representation and the byte[] representation etc. The best is not to go through too many API layers if one wants to achieve the best performance on large sets of data, and to embed as much info as you can into the algo as possible without making it too brittle or incomprehensible.

So, to further follow up from Richard's tests, I add another test below (test9) which is the way I've gone in my own work and answers his point 4 in his Analysis section:

Use unsafe memory pointer accessing to achieve the most performant result. Something that comes naturally if you use c++, but not necessarily c#. This is similar to what BitConverter is doing under the hood, but without the parameter and safety checks (as, of course, we know what we are doing... ;)

Results:

  • 32-bit option:

    test3: 00:00:06.2373138
    test5: 00:00:03.1193338
    test6: 00:00:03.1609287
    test7: 00:00:07.7328020
    test8: 00:00:06.4192130
    test9: 00:00:03.9590307
    
  • 64-bit option:

    test3: 00:00:06.2209098
    test5: 00:00:00.5563930
    test6: 00:00:01.5486780
    test7: 00:00:08.4858474
    test8: 00:00:05.4991740
    test9: 00:00:02.2928944
    

Here the same code, including the new test9:

using System;

namespace BitConverterTest
{
    class Program
    {
        const int iters = 1024 * 1024 * 1024;
        const int arrayLen = iters / 4;
        static byte[] array = new byte[arrayLen];

        static void Main(string[] args)
        {
            //test1(1, 2, 3, 4);
            //test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            //test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);

            // Fill array with good PRNG data
            var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
            rng.GetBytes(array);

            test7();
            test8();
            test9();
        }

        // BitConverter with aligned input
        static void test3(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with separate variables.
        static void test5(byte w, byte x, byte y, byte z)
        {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
            {
                int a = w | (x << 8) | (y << 16) | (z << 24);
                res += a;
            }
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with array elements.
        static void test6(byte w, byte x, byte y, byte z)
        {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
            {
                int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
                res += a;
            }
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }

        // BitConvert from large array...
        static void test7()
        {
            var its = iters / arrayLen * 4; // *4 to remove arrayLen/4 factor.
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    var x = BitConverter.ToInt32(array, pos);
                    res += x;
                }
            }
            Console.WriteLine("test7: " + timer.Elapsed + " " + res);
        }

        // Bitfiddle from large array...
        static void test8()
        {
            var its = iters / arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    int x = array[pos] | (array[pos + 1] << 8) | (array[pos + 2] << 16) | (array[pos + 3] << 24);
                    res += x;
                }
            }
            Console.WriteLine("test8: " + timer.Elapsed + " " + res);
        }

        // unsafe memory operations from large array...
        // (essentialy internals of BitConverter without param checks, etc)
        static unsafe void test9()
        {
            var its = iters / arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            int value = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    fixed (byte* numPtr = &array[pos])
                    {
                        value = *(int*)numPtr;
                    }
                    int x = *(int*)&value;
                    res += x;
                }
            }
            Console.WriteLine("test9: " + timer.Elapsed + " " + res);
        }

    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文