CUDA DLL 上的 C# P/Invoke 最终导致 AccessViolationException

发布于 2024-09-18 19:16:34 字数 2149 浏览 9 评论 0原文

这让我发疯。我已经查看了所有内容,但不确定我是否完全理解导致此错误的原因。

我正在调用一个 DLL(我已将其编码为单独的项目),该 DLL 对我正在使用的某些数据运行 CUDA 内核。尽管如此,我怀疑这个问题不是由 CUDA 引起的,因为代码已经过测试并且至少可以运行一次,通常在导致 AccessViolationException 之前运行 64-100 次。

问题是,我传递了三个公共静态数组:

public static float[] neuronInputs;
public static float[] connectionOutputs;
public static int[] calcOrder;

来自neuronInputs的数据被复制到GPU上,进行操作,然后复制回connectionOutputs(calcOrder只能读取,不能写入)。我使用 connectionOutputs 数组执行一系列操作。然后我重写 NeuronInputs 数组,并将其发送回 GPU。如此反复,直至失败。但它总是失败。

我调用这个函数:

[DllImport("CUDANeural.dll")] 
 static extern void GenerateSubstrateConnections(
 [In, Out]    [MarshalAs(UnmanagedType.LPArray)]  float[] neuronInputs,
 [In, Out] [MarshalAs(UnmanagedType.LPArray)] int[] calcOrder,
 [In, Out]      [MarshalAs(UnmanagedType.LPArray)] float[] outWeights
    );

我只为三个数组分配一次内存,并为每个数组分配一个大块。我已经在托管端对其进行了测试,并且我无法在 CUDA 代码内的数组之外进行索引。

我想我的问题是,是什么导致了这个 AccessViolationException?假设它不是 CUDA 代码。

编辑: 这是来自非托管端的调用

extern "C" __declspec(dllexport) void GenerateSubstrateConnections(
float* neuronInputs, int* calcOrder, float* outWeights);

看来我对编程的 CUDA 方面可能是错误的。我在调用GenerateSubstrateConnections 的末尾添加了cudaExitThread() 调用,这似乎解决了问题。但是,为了澄清起见,我正在调用一个不同的函数:

[DllImport("CUDANeural.dll")]
static extern void DebugSubstrateConnections(
[In, Out]     IntPtr neuronInputs,
[In, Out]  IntPtr calcOrder,
[In, Out]      IntPtr outWeights
);

在托管代码中调用GenerateSubstrateConnections之前,我固定GCHandles

 SubstrateDescription.inputHandle = GCHandle.Alloc(SubstrateDescription.neuronInputs, GCHandleType.Pinned);
 SubstrateDescription.connectionHandle = GCHandle.Alloc(SubstrateDescription.outputConnections, GCHandleType.Pinned);
calcHandle = GCHandle.Alloc(calcOrder, GCHandleType.Pinned);

然后调用

GenerateSubstrateConnections(
SubstrateDescription.inputHandle.AddrOfPinnedObject(), 
calcHandle.AddrOfPinnedObject(),
SubstrateDescription.connectionHandle.AddrOfPinnedObject());

我不完全确定这是否有必要,但我知道它有效(当前)。谢谢大家的评论,他们帮助我解决了这个问题。

This is driving me crazy. I've looked all over, but I'm not sure I understand exactly what's causing this error.

I'm making a call to a DLL (that I've coded as a separate project) which runs a CUDA kernel on some data I'm using. Although, I suspect the issue isn't being caused by CUDA, since the code has been tested and works at least once, and usually 64-100 times before causing an AccessViolationException.

The issue is, I'm passing in three public static arrays:

public static float[] neuronInputs;
public static float[] connectionOutputs;
public static int[] calcOrder;

The data from neuronInputs gets copied onto the GPU, operated on, then copied back to connectionOutputs (calcOrder is only read, but not written). I perform a bunch of operations using the connectionOutputs array. Then I write over the neuronInputs array, and send it back to the GPU. Repeating until it fails. And it always fails.

I'm calling this function:

[DllImport("CUDANeural.dll")] 
 static extern void GenerateSubstrateConnections(
 [In, Out]    [MarshalAs(UnmanagedType.LPArray)]  float[] neuronInputs,
 [In, Out] [MarshalAs(UnmanagedType.LPArray)] int[] calcOrder,
 [In, Out]      [MarshalAs(UnmanagedType.LPArray)] float[] outWeights
    );

I only allocate the memory for the three arrays once, and I allocate a large chunk for each. I've tested it on the managed side, and there is no way I would be indexing outside of the arrays inside the CUDA code.

I guess my question is, what is causing this AccessViolationException? Assuming it isn't the CUDA code.

EDIT:
Here's the call from the unmanaged side

extern "C" __declspec(dllexport) void GenerateSubstrateConnections(
float* neuronInputs, int* calcOrder, float* outWeights);

It seems I might have been wrong about the CUDA side of programming. I've added in an cudaExitThread() call at the end of my call to the GenerateSubstrateConnections and this has seemed to correct the issue. However, for clarification, I'm calling a different function:

[DllImport("CUDANeural.dll")]
static extern void DebugSubstrateConnections(
[In, Out]     IntPtr neuronInputs,
[In, Out]  IntPtr calcOrder,
[In, Out]      IntPtr outWeights
);

And before I call GenerateSubstrateConnections in managed code I pin the GCHandles

 SubstrateDescription.inputHandle = GCHandle.Alloc(SubstrateDescription.neuronInputs, GCHandleType.Pinned);
 SubstrateDescription.connectionHandle = GCHandle.Alloc(SubstrateDescription.outputConnections, GCHandleType.Pinned);
calcHandle = GCHandle.Alloc(calcOrder, GCHandleType.Pinned);

Then call

GenerateSubstrateConnections(
SubstrateDescription.inputHandle.AddrOfPinnedObject(), 
calcHandle.AddrOfPinnedObject(),
SubstrateDescription.connectionHandle.AddrOfPinnedObject());

I'm not entirely sure if this is necessary, but I know that it works (currently). Thank you for all the comments, they helped me squeeze out the issue.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

陌生 2024-09-25 19:17:21

我什至不确定您是否可以对 CUDA 函数执行简单的 pInvoke,因为它们不在主处理器上运行。直接使用本机 CUDA API 的最佳选择可能是使用 C++/CLI。 nVidia 刚刚为此发布了一个支持包。
其他更简单的选项包括使用 OPENCL,它具有名为 OpenTK 的 .Net 库,它为大多数用途提供托管包装器。

I am not sure even you can do a simple pInvoke on CUDA Functions as they are not running on the main processor. Best option to directly use native CUDA API might be to use C++/CLI. And nVidia just released a support package for that.
Other simpler options include using OPENCL which has the .Net library available called OpenTK which provides Managed wrappers for most uses.

我的痛♀有谁懂 2024-09-25 19:17:12

也许是线程安全问题。由于您使用的是静态内存,因此您应该锁定对象,或使用其他一些同步选项,除非您完全确定它是单线程的。

Maybe a thread safety issue. Since you are using static memory, you should be locking the object, or using some other synchronization option unless you are absolutely sure that it is single threaded.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文