.NET 中的 CUDA 全局内存释放问题

发布于 2024-08-04 19:12:28 字数 1550 浏览 3 评论 0原文

我有一个类(参见下面的示例),它充当 CUDA 内存结构的 .NET 包装器,
使用 cudaMalloc() 分配并使用 IntPtr 类型的成员字段引用。
(该类使用本机 C DLL 的 DllImport,它包装了各种 CUDA 功能。)

dispose 方法检查指针是否为 IntPtr.Zero,如果不是则调用 cudaFree()
成功释放内存(返回 CUDA 成功)
并将指针设置为 IntPtr.Zero。

Finalize方法调用dispose方法。

问题是,如果调用 Finalize 方法而之前没有调用 dispose,
然后cudaFree()函数设置一个错误代码“无效的设备指针”。

我检查过,cudaFree() 接收的地址与 cudaMalloc() 返回的地址相同,并且之前没有调用过 dispose()。

当我添加对 dispose() 的显式调用时,相同的地址已成功释放。

我发现的唯一解决方法是不从终结器调用 dispose 方法,但是,如果不总是调用 dispose(),这可能会导致内存泄漏。

有什么想法为什么会发生这种情况吗? - 我在 Windows Vista 64 位 + GeForce 8800 和 Windows XP 32 位 + Quadro FX 上的 .NET 3.5 SP1 下的 CUDA 2.2 和 2.3 中遇到了同样的问题(不确定是哪个数字)。

class CudaEntity : IDisposable
{
    private IntPtr dataPointer;

    public CudaEntity()
    {
        // Calls cudaMalloc() via DllImport,
        // receives error code and throws expection if not 0
        // assigns value to this.dataPointer
    }

    public Dispose()
    {
        if (this.dataPointer != IntPtr.Zero)
        {
            // Calls cudaFree() via DllImport,
            // receives error code and throws expection if not 0

            this.dataPointer = IntPtr.Zero;
        }
    }

    ~CudaEntity()
    {
        Dispose();
    }
}
{
    // this code works
    var myEntity = new CudaEntity();
    myEntity.Dispose();
}
{
    // This code cause a "invalid device pointer"
    // error on finalizer's call to cudaFree()
    var myEntity = new CudaEntity();
}

I have a class(see example bellow) which acts as a .NET wrapper for a CUDA memory structure,

allocated using cudaMalloc() and referenced using a member field of type IntPtr.

(The class uses DllImport of a native C DLL which wraps various CUDA functionality.)



The dispose methods checks if the pointer is IntPtr.Zero and if not calls cudaFree()

which successfully deallocates the memory (returns CUDA success)

and sets the pointer to IntPtr.Zero.



The finalize method calls the dispose method.



The problem is, that if the finalize methods is called with out the dispose being called previously,

then the cudaFree() function sets an error code of "invalid device pointer".



I checked and the address the cudaFree() receives is the same address returned by the cudaMalloc() and no dispose() has been called previously.



When I add a explict call to dispose() the same address is successfully freed.



The only workaround I found was to not call the dispose method from the finalizer, however, this could cause a memory leak, if dispose() is not always called.



Any ideas why this happens? - I encountered the same issue with CUDA 2.2 and 2.3, under .NET 3.5 SP1 on Windows Vista 64bit + GeForce 8800 and on Windows XP 32bit + Quadro FX (not sure which number).

class CudaEntity : IDisposable
{
    private IntPtr dataPointer;

    public CudaEntity()
    {
        // Calls cudaMalloc() via DllImport,
        // receives error code and throws expection if not 0
        // assigns value to this.dataPointer
    }

    public Dispose()
    {
        if (this.dataPointer != IntPtr.Zero)
        {
            // Calls cudaFree() via DllImport,
            // receives error code and throws expection if not 0

            this.dataPointer = IntPtr.Zero;
        }
    }

    ~CudaEntity()
    {
        Dispose();
    }
}
{
    // this code works
    var myEntity = new CudaEntity();
    myEntity.Dispose();
}
{
    // This code cause a "invalid device pointer"
    // error on finalizer's call to cudaFree()
    var myEntity = new CudaEntity();
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

佼人 2024-08-11 19:12:28

问题是终结器是在 GC 线程上执行的,在一个线程中分配的 CUDA 资源不能在另一个线程中使用。 CUDA 编程指南的片段:

多个主机线程可以执行
同一设备上的设备代码,但通过
设计,主机线程可以执行
仅在一台设备上的设备代码。作为一个
结果,多个主机线程
需要执行设备代码
多个设备。此外,任何 CUDA
通过运行时创建的资源
在一个主机线程中不能使用
来自另一个主机线程的运行时。

最好的选择是使用 using 语句,它确保 Dispose() 方法始终在“受保护”代码块的末尾被调用:

using(CudaEntity ent = new CudaEntity())
{

}

The problem is that finalizers are executed on the GC thread, CUDA resource allocated in one thread can't be used in another one. A snip from CUDA programming guide:

Several host threads can execute
device code on the same device, but by
design, a host thread can execute
device code on only one device. As a
consequence, multiple host threads are
required to execute device code on
multiple devices. Also, any CUDA
resources created through the runtime
in one host thread cannot be used by
the runtime from another host thread.

Your best bet is to use the using statement, which ensures that the Dispose() method gets always called at the end of the 'protected' code block:

using(CudaEntity ent = new CudaEntity())
{

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文