管理到非管理开销

发布于 2024-11-29 22:00:33 字数 270 浏览 0 评论 0 原文

在 .NET 中,有几个地方必须离开托管代码并进入非托管(也称为本机代码)领域。仅举几个例子:

  • extern dll 函数
  • COM 调用

总是有关于从一侧跳到另一侧原因的开销的评论,我的问题是是否有人测量了正在发生的确切开销,并可以解释如何计算它。例如,也许 byte[] 可以转换为 IntPtr 甚至 .NET 中的 byte* 并帮助编组器节省一些 CPU 周期。

In .NET there are several places when you must leave managed code and enter the realm of unmanaged a.k.a. native code. To name a few:

  • extern dll functions
  • COM invocation

There are always comments about overhead that jump from one side to another causes, and my question here is if anybody MEASURED exact overhead that is happening, and can explain how it can be calculated. For example, maybe byte[] can be converted to IntPtr or even to byte* in .NET and help marshaller save some CPU cycles.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

述情 2024-12-06 22:00:33

获取托管数组的地址确实是可能的。

首先,您必须使用 System.Runtime.InteropServices 固定阵列.GCHandle,这样垃圾收集器就不会移动数组。只要非托管代码可以访问托管数组,就必须保持分配的句柄。

byte[] the_array = ... ;
GCHandle pin = GCHandle.Alloc(the_array, GCHandleType.Pinned);

然后,您应该能够使用 System.Runtime .InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement 获取数组中任何元素的 IntPtr。

IntPtr p = Marshal.UnsafeAddrOfPinnedArrayElement(the_array,0);

重要提示:固定对象会严重扰乱 GC 操作。能够在堆中移动对象是现代 GC 能够(在某种程度上)跟上手动内存管理的原因之一。通过将对象固定在托管堆中,GC 失去了相对于手动内存管理的性能优势:相对未碎片化的堆。

因此,如果您计划将这些数组保留在“非托管端”一段时间,请考虑制作该数组的副本。复制内存的速度快得惊人。使用 Marshal.Copy(*) 方法从托管内存复制到非托管内存,反之亦然。

Getting the address of a managed array is indeed possible.

First you have to pin the array using System.Runtime.InteropServices.GCHandle, so that the garbage collector doesn't move the array around. You must keep this handle allocated as long as the unmanaged code has access to the managed array.

byte[] the_array = ... ;
GCHandle pin = GCHandle.Alloc(the_array, GCHandleType.Pinned);

Then, you should be able to use System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement to get an IntPtr to any element in the array.

IntPtr p = Marshal.UnsafeAddrOfPinnedArrayElement(the_array,0);

Important: Pinning objects seriously disrupts GC operation. Being able to move objects around in the heap is one of the reasons why modern GCs can (somewhat) keep up with manual memory management. By pinning objects in the managed heap, the GC looses it's one performance advantage over manual memory management: a relatively unfragmented heap.

So if you plan to keep these arrays "on the unmanaged side" for some time, consider making a copy of the array instead. Copying memory is surprisingly fast. Use the Marshal.Copy(*) methods to copy from managed to unmanaged memory and vice-versa.

请爱~陌生人 2024-12-06 22:00:33

[我发现我没有真正回答关于如何衡量的问题;最好的测量方法就是使用一些仪器,或者使用仪器类(请参阅:http://msdn.microsoft.com/en-us/library/aa645516(v=vs.71).aspx) 或即使是像在您感兴趣的任何调用周围放置一些计时器这样简单的事情。因此,以最粗略的形式,当我们试图找到性能损失时,例如,在 C# 和 ATL COM 之间的调用中,我们只需将计时器放置在空函数调用周围,我们将启动一个计时器,在 C# 和空 ATL COM 函数之间的紧密循环中运行,执行足够的循环,以便我们能够在运行之间获得相当一致的答案,然后执行完全相同的操作C++ 中的东西。然后,这两个数字之间的差异就是跨越该边界进行调用的开销。]

我实际上没有任何硬性数字,但我可以根据以前的经验回答,只要您以有效的方式使用事物,C#执行时的开销(如果有的话)非常少,超出了人们在 C++ 中的预期,具体取决于您尝试执行的操作的确切性质。

我开发了几个应用程序,这些应用程序通过非常高的频率(100MHz-3GHz A/D 板)并通过执行某些操作(如在托管代码中分配的 byte[] 数组,然后锁定)来收集大量超声波数据作为指针并作为数据缓冲区传递;传输大量数据并对其进行处理以对各个部分进行成像)。

早在那时,我们就用 C++ 代码与 VB6 进行通信,我们将 C++ 包装在 ATL Simple COM 对象中,并在需要数据和成像时来回传递指针。很久以后,我们在 VS.NET 2003 中使用 C# 实践了类似的技术。此外,我还为这里的一个问题编写了一个库,它允许大量非托管数据存储,可以为非常大的数组和数组操作提供支持,甚至可以支持很多LINQ 类型的功能! 使用数组字段而不是大量对象< /a> . (注意:最新版本中的引用计数存在一些问题,我还没有找到。)

此外,我使用 ATL COM 与 FFTW 库进行了一些接口,以便执行高性能 DSP效果很好,虽然该库还没有完全准备好进入黄金时段,但它是我为上面的链接创建的尖峰解决方案的基础,并且尖峰为我提供了我正在寻找的大部分信息,以完成我更成熟的非托管内存分配器和快速数组支持外部分配以及来自非托管堆的非托管分配的处理,这最终将取代 FFTW C# 库中当前存在的处理。

所以,重点是,我认为性能损失被夸大了,尤其是考虑到我们现在拥有的处理能力。事实上,如果您只是使用 C# 本身注意避免一些陷阱(例如调用大量小型跨界函数而不是传递缓冲区,或多次分配字符串等),则可以获得非常好的性能。但当涉及到高速处理时,C# 仍然可以满足我提到的所有场景。是的,有时需要一点深思熟虑吗?但由于在开发速度、可维护性和可理解性方面获得的优势,我花在弄清楚如何获得所需性能上的时间始终远远少于主要或完全使用 C++ 进行开发所需的时间。

我的两点(哦,需要注意的是,我特别提到 ATL COM 是因为使用 MFC 时所遭受的性能损失值得。我记得,当通过MFC COM 对象与 ATL 上的接口相比,另一方面,ATL 只比直接在 C++ 中调用等效函数慢一点,除了。即使有大量的我们正在收集和移动超声波数据,我们没有发现它是瓶颈。)


哦,我发现了这个:http://msdn.microsoft.com/en-us/library/ms973839.aspx“.NET 应用程序中的性能提示和技巧”。我发现这句话很有趣:

为了加快转换时间,请尽可能使用 P/Invoke。
开销只有 31 条指令加上
如果需要数据编组,则编组,否则仅 8。通讯
互操作的成本要高得多,需要多达 65 条指令。

示例部分标题:“进行厚调用”、“使用 For 循环进行字符串迭代”、“寻找异步 IO 机会”。


引用的快速内存库中的一些片段:

位于 MemoryArrayEnumerator.cs

    public MemoryArray(int parElementCount, int parElementSize_bytes)
    {
        Descriptor =
            new MemoryArrayDescriptor
                (
                    Marshal.AllocHGlobal(parElementCount * parElementSize_bytes),
                    parElementSize_bytes,
                    parElementCount
                );
    }

    protected override void OnDispose()
    {
        if (Descriptor.StartPointer != IntPtr.Zero)
            Marshal.FreeHGlobal(Descriptor.StartPointer);

        base.OnDispose();
    }

    // this really should only be used for random access to the items, if you want sequential access
    // use the enumerator which uses pointer math via the array descriptor's TryMoveNext call.
    //
    // i haven't figured out exactly where it would go, but you could also do something like 
    // having a member MemoryArrayItem that gets updated here rather than creating a new one each
    // time; that would break anything that was trying to hold on to a reference to the item because
    // it will no longer be immutable.
    //
    // that could be remedied by something like a call that would return a new copy of the item if it
    // was to be held onto.  i would definitely need to see that i needed the performance boost and
    // that it was significant enough before i would contradict the users expectations on that one.

    public MemoryArrayItem this[int i]
    {
        get
        {
            return new MemoryArrayItem(this, Descriptor.GetElementPointer(i), Descriptor.ElementSize_bytes);
        }
    }

    // you could also do multiple dimension indexing; to do so you would have to pass in dimensions somehow in
    // the constructor and store them.
    //
    // there's all sorts of stuff you could do with this; take various slices, etc, do switching between
    // last-to-first/first-to-last/custom dimension ordering, etc, but i didn't tackle that for the example.
    //
    // if you don't need to error check here then just you could always do something like:
    public MemoryArrayItem this[int x, int y]
    {
        get
        {
            if (myDimensions == null)
                throw new ArrayTypeMismatchException("attempted to index two dimensional array without calling SetDimensions()");

            if (myDimensions.Length != 2)
                throw new ArrayTypeMismatchException("currently set dimensions do not provide a two dimensional array. [dimension: " + myDimensions.Length + "]");

            int RowSize_bytes = myDimensions[0] * Descriptor.ElementSize_bytes;

            return new MemoryArrayItem(this, Descriptor.StartPointer + (y * RowSize_bytes) + x * Descriptor.ElementSize_bytes, Descriptor.ElementSize_bytes);
        }
    }

    public void SetDimensions(int[] parDimensions)
    {
        if (parDimensions.Length <= 0)
            throw new Exception("unable to set array to dimension of zero.");

        for (int i = 0; i < parDimensions.Length; ++i)
            if (parDimensions[i] <= 0)
                throw new ArgumentOutOfRangeException("unable to set dimension at index " + i.ToString() + " to " + parDimensions[i] + ".");

        myDimensions = new int[parDimensions.Length];
        parDimensions.CopyTo(myDimensions, 0);
    }
    private int[] myDimensions = null;

MemoryArray.cs

public class MemoryArrayEnumerator :
    IEnumerator<MemoryArrayItem>
{
    // handles reference counting for the main array 
    private AutoReference<MemoryArray> myArray;
    private MemoryArray Array { get { return myArray; } }

    private IntPtr myCurrentPosition = IntPtr.Zero;

    public MemoryArrayEnumerator(MemoryArray parArray)
    {
        myArray = AutoReference<MemoryArray>.CreateFromExisting(parArray);
    }

    //---------------------------------------------------------------------------------------------------------------
    #region IEnumerator<MemoryArrayItem> implementation
    //---------------------------------------------------------------------------------------------------------------
    public MemoryArrayItem Current
    {
        get 
        {
            if (Array.Descriptor.CheckPointer(myCurrentPosition))
                return new MemoryArrayItem(myArray, myCurrentPosition, Array.Descriptor.ElementSize_bytes);
            else
                throw new IndexOutOfRangeException("Enumerator Error: Current() was out of range");
        }
    }

    public void Dispose()
    {
        myArray.Dispose();
    }

    object System.Collections.IEnumerator.Current
    {
        get { throw new NotImplementedException(); }
    }

    public bool MoveNext()
    {
        bool RetVal = true;

        if (myCurrentPosition == IntPtr.Zero)
            myCurrentPosition = Array.Descriptor.StartPointer;
        else
            RetVal = Array.Descriptor.TryMoveNext(ref myCurrentPosition);

        return RetVal;
    }

    public void Reset()
    {
        myCurrentPosition = IntPtr.Zero;
    }
    //---------------------------------------------------------------------------------------------------------------
    #endregion IEnumerator<MemoryArrayItem> implementation
    //---------------------------------------------------------------------------------------------------------------

[I see I didn't really answer the question about how you would measure; the best way to measure is just with some instrumentation, either with the instrumentation classes (see: http://msdn.microsoft.com/en-us/library/aa645516(v=vs.71).aspx) or even with something as simple as placing in some timers around whatever calls you're interested in. So, in the crudest form, when we were trying to find the performance hit taken, for instance, in our call between C# and ATL COM, we would just place timers around an empty function call we would start a timer, run in a tight loop between C# and the empty ATL COM function, do enough loops that we were able to get reasonably consistent answers between runs, and then do the exact same thing in C++. Then, the difference between those two numbers is the overhead for making the call across that boundary.]

I don't really have any hard numbers, but I can answer from previous experience that as long as you use things in an efficient way, C# performs with very little, if any, overhead beyond what one might expect in C++, depending on the exact nature of what you are trying to do.

I worked on several applications that collected very large amounts ultrasonic data through very high frequency (100MHz-3GHz A/D boards) and by doing certain things, as you suggest, (things like byte[] arrays allocated in managed code and then locked down as pointers and passed as buffers for the data; transferring large amounts of data and processing it for imaging various parts).

Way back when, we communicated with C++ code to VB6, and we would wrap the C++ in ATL Simple COM objects, and pass pointers back and forth when needed for data and imaging. We practiced similar techniques much later with C# in VS.NET 2003. Also, I have written a library for a question here that allows for massive unmanaged data storage that can provide support for very large arrays and array operations, as well as even a lot of LINQ-type functionality! Using array fields instead of massive number of objects . (note: there are some issues with the reference counting that were in the latest version, and I have not yet tracked down.)

Also, I have done some interfacing using ATL COM with the FFTW library in order to perform high-performance DSP to good effect, although that library is not quite ready for primetime, it was the basis of the spike solution I created for the link above, and the spike gave me most of the information I was looking for to complete my much more full-blown unmanaged memory allocator and fast-array processing supporting both externally allocated as well as unmanaged allocations from the unmanaged heap, which will eventually replace the processing that currently exists in the FFTW C# library.

So, the point is, I believe the performance penalty is very overblown, especially with the processing power we have these days. In fact, you can get very good performance if you just take care to avoid some of the pitfalls (like calling lots of little cross-boundary functions instead of passing buffers, or multiple allocations of strings and such) using C# by itself. But when it comes to high-speed processing, C# still can fit the bill for all of the scenarios I've mentioned. Does it take a little forethought, yes, sometimes. But the advantages gained in development speed, maintainability, and understandability, the time I spent figuring out how to get the performance I need has always been far less than the amount of time it would have taken to develop primarily or completely in C++.

My two bits. (Oh, one caveat, I mention ATL COM specifically because the performance hit you took when using MFC was not worth it. As I recall it, it was about two orders of magnitude slower when calling out via an MFC COM Object versus the interface on the ATL one, and did not meet our needs. ATL on the other hand was only a smidge slower than calling the equivalent function directly in C++. Sorry, I do not recall any particular numbers offhand, other than that even with the large amounts of ultrasonic data we were collecting and moving around, we did not find it a bottleneck.)


Oh, I found this: http://msdn.microsoft.com/en-us/library/ms973839.aspx "Performance Tips and Tricks in .NET Applications". I found this quote very interesting:

To speed up transition time, try to make use of P/Invoke when you can.
The overhead is as little as 31 instructions plus the cost of
marshalling if data marshalling is required, and only 8 otherwise. COM
interop is much more expensive, taking upwards of 65 instructions.

Sample section titles: "Make Chunky Calls", "Use For Loops for String Iteration", "Be on the Lookout for Asynchronous IO Opportunities".


Some snippets from the referenced Fast Memory Library:

in MemoryArray.cs

    public MemoryArray(int parElementCount, int parElementSize_bytes)
    {
        Descriptor =
            new MemoryArrayDescriptor
                (
                    Marshal.AllocHGlobal(parElementCount * parElementSize_bytes),
                    parElementSize_bytes,
                    parElementCount
                );
    }

    protected override void OnDispose()
    {
        if (Descriptor.StartPointer != IntPtr.Zero)
            Marshal.FreeHGlobal(Descriptor.StartPointer);

        base.OnDispose();
    }

    // this really should only be used for random access to the items, if you want sequential access
    // use the enumerator which uses pointer math via the array descriptor's TryMoveNext call.
    //
    // i haven't figured out exactly where it would go, but you could also do something like 
    // having a member MemoryArrayItem that gets updated here rather than creating a new one each
    // time; that would break anything that was trying to hold on to a reference to the item because
    // it will no longer be immutable.
    //
    // that could be remedied by something like a call that would return a new copy of the item if it
    // was to be held onto.  i would definitely need to see that i needed the performance boost and
    // that it was significant enough before i would contradict the users expectations on that one.

    public MemoryArrayItem this[int i]
    {
        get
        {
            return new MemoryArrayItem(this, Descriptor.GetElementPointer(i), Descriptor.ElementSize_bytes);
        }
    }

    // you could also do multiple dimension indexing; to do so you would have to pass in dimensions somehow in
    // the constructor and store them.
    //
    // there's all sorts of stuff you could do with this; take various slices, etc, do switching between
    // last-to-first/first-to-last/custom dimension ordering, etc, but i didn't tackle that for the example.
    //
    // if you don't need to error check here then just you could always do something like:
    public MemoryArrayItem this[int x, int y]
    {
        get
        {
            if (myDimensions == null)
                throw new ArrayTypeMismatchException("attempted to index two dimensional array without calling SetDimensions()");

            if (myDimensions.Length != 2)
                throw new ArrayTypeMismatchException("currently set dimensions do not provide a two dimensional array. [dimension: " + myDimensions.Length + "]");

            int RowSize_bytes = myDimensions[0] * Descriptor.ElementSize_bytes;

            return new MemoryArrayItem(this, Descriptor.StartPointer + (y * RowSize_bytes) + x * Descriptor.ElementSize_bytes, Descriptor.ElementSize_bytes);
        }
    }

    public void SetDimensions(int[] parDimensions)
    {
        if (parDimensions.Length <= 0)
            throw new Exception("unable to set array to dimension of zero.");

        for (int i = 0; i < parDimensions.Length; ++i)
            if (parDimensions[i] <= 0)
                throw new ArgumentOutOfRangeException("unable to set dimension at index " + i.ToString() + " to " + parDimensions[i] + ".");

        myDimensions = new int[parDimensions.Length];
        parDimensions.CopyTo(myDimensions, 0);
    }
    private int[] myDimensions = null;

from MemoryArrayEnumerator.cs

public class MemoryArrayEnumerator :
    IEnumerator<MemoryArrayItem>
{
    // handles reference counting for the main array 
    private AutoReference<MemoryArray> myArray;
    private MemoryArray Array { get { return myArray; } }

    private IntPtr myCurrentPosition = IntPtr.Zero;

    public MemoryArrayEnumerator(MemoryArray parArray)
    {
        myArray = AutoReference<MemoryArray>.CreateFromExisting(parArray);
    }

    //---------------------------------------------------------------------------------------------------------------
    #region IEnumerator<MemoryArrayItem> implementation
    //---------------------------------------------------------------------------------------------------------------
    public MemoryArrayItem Current
    {
        get 
        {
            if (Array.Descriptor.CheckPointer(myCurrentPosition))
                return new MemoryArrayItem(myArray, myCurrentPosition, Array.Descriptor.ElementSize_bytes);
            else
                throw new IndexOutOfRangeException("Enumerator Error: Current() was out of range");
        }
    }

    public void Dispose()
    {
        myArray.Dispose();
    }

    object System.Collections.IEnumerator.Current
    {
        get { throw new NotImplementedException(); }
    }

    public bool MoveNext()
    {
        bool RetVal = true;

        if (myCurrentPosition == IntPtr.Zero)
            myCurrentPosition = Array.Descriptor.StartPointer;
        else
            RetVal = Array.Descriptor.TryMoveNext(ref myCurrentPosition);

        return RetVal;
    }

    public void Reset()
    {
        myCurrentPosition = IntPtr.Zero;
    }
    //---------------------------------------------------------------------------------------------------------------
    #endregion IEnumerator<MemoryArrayItem> implementation
    //---------------------------------------------------------------------------------------------------------------
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文