CUDA DLL 上的 C# P/Invoke 最终导致 AccessViolationException

发布于 2024-09-18 19:16:34 字数 2149 浏览 10 评论 0原文

这让我发疯。我已经查看了所有内容，但不确定我是否完全理解导致此错误的原因。

我正在调用一个 DLL（我已将其编码为单独的项目），该 DLL 对我正在使用的某些数据运行 CUDA 内核。尽管如此，我怀疑这个问题不是由 CUDA 引起的，因为代码已经过测试并且至少可以运行一次，通常在导致 AccessViolationException 之前运行 64-100 次。

问题是，我传递了三个公共静态数组：

public static float[] neuronInputs;
public static float[] connectionOutputs;
public static int[] calcOrder;

来自neuronInputs的数据被复制到GPU上，进行操作，然后复制回connectionOutputs（calcOrder只能读取，不能写入）。我使用 connectionOutputs 数组执行一系列操作。然后我重写 NeuronInputs 数组，并将其发送回 GPU。如此反复，直至失败。但它总是失败。

我调用这个函数：

[DllImport("CUDANeural.dll")] 
 static extern void GenerateSubstrateConnections(
 [In, Out]    [MarshalAs(UnmanagedType.LPArray)]  float[] neuronInputs,
 [In, Out] [MarshalAs(UnmanagedType.LPArray)] int[] calcOrder,
 [In, Out]      [MarshalAs(UnmanagedType.LPArray)] float[] outWeights
    );

我只为三个数组分配一次内存，并为每个数组分配一个大块。我已经在托管端对其进行了测试，并且我无法在 CUDA 代码内的数组之外进行索引。

我想我的问题是，是什么导致了这个 AccessViolationException？假设它不是 CUDA 代码。

编辑：这是来自非托管端的调用

extern "C" __declspec(dllexport) void GenerateSubstrateConnections(
float* neuronInputs, int* calcOrder, float* outWeights);

看来我对编程的 CUDA 方面可能是错误的。我在调用GenerateSubstrateConnections 的末尾添加了cudaExitThread() 调用，这似乎解决了问题。但是，为了澄清起见，我正在调用一个不同的函数：

[DllImport("CUDANeural.dll")]
static extern void DebugSubstrateConnections(
[In, Out]     IntPtr neuronInputs,
[In, Out]  IntPtr calcOrder,
[In, Out]      IntPtr outWeights
);

在托管代码中调用GenerateSubstrateConnections之前，我固定GCHandles

 SubstrateDescription.inputHandle = GCHandle.Alloc(SubstrateDescription.neuronInputs, GCHandleType.Pinned);
 SubstrateDescription.connectionHandle = GCHandle.Alloc(SubstrateDescription.outputConnections, GCHandleType.Pinned);
calcHandle = GCHandle.Alloc(calcOrder, GCHandleType.Pinned);

然后调用

GenerateSubstrateConnections(
SubstrateDescription.inputHandle.AddrOfPinnedObject(), 
calcHandle.AddrOfPinnedObject(),
SubstrateDescription.connectionHandle.AddrOfPinnedObject());

我不完全确定这是否有必要，但我知道它有效（当前）。谢谢大家的评论，他们帮助我解决了这个问题。

原文

This is driving me crazy. I've looked all over, but I'm not sure I understand exactly what's causing this error.

I'm making a call to a DLL (that I've coded as a separate project) which runs a CUDA kernel on some data I'm using. Although, I suspect the issue isn't being caused by CUDA, since the code has been tested and works at least once, and usually 64-100 times before causing an AccessViolationException.

The issue is, I'm passing in three public static arrays:

public static float[] neuronInputs;
public static float[] connectionOutputs;
public static int[] calcOrder;

The data from neuronInputs gets copied onto the GPU, operated on, then copied back to connectionOutputs (calcOrder is only read, but not written). I perform a bunch of operations using the connectionOutputs array. Then I write over the neuronInputs array, and send it back to the GPU. Repeating until it fails. And it always fails.

I'm calling this function:

[DllImport("CUDANeural.dll")] 
 static extern void GenerateSubstrateConnections(
 [In, Out]    [MarshalAs(UnmanagedType.LPArray)]  float[] neuronInputs,
 [In, Out] [MarshalAs(UnmanagedType.LPArray)] int[] calcOrder,
 [In, Out]      [MarshalAs(UnmanagedType.LPArray)] float[] outWeights
    );

I only allocate the memory for the three arrays once, and I allocate a large chunk for each. I've tested it on the managed side, and there is no way I would be indexing outside of the arrays inside the CUDA code.

I guess my question is, what is causing this AccessViolationException? Assuming it isn't the CUDA code.

EDIT:
Here's the call from the unmanaged side

extern "C" __declspec(dllexport) void GenerateSubstrateConnections(
float* neuronInputs, int* calcOrder, float* outWeights);

It seems I might have been wrong about the CUDA side of programming. I've added in an cudaExitThread() call at the end of my call to the GenerateSubstrateConnections and this has seemed to correct the issue. However, for clarification, I'm calling a different function:

[DllImport("CUDANeural.dll")]
static extern void DebugSubstrateConnections(
[In, Out]     IntPtr neuronInputs,
[In, Out]  IntPtr calcOrder,
[In, Out]      IntPtr outWeights
);

And before I call GenerateSubstrateConnections in managed code I pin the GCHandles

 SubstrateDescription.inputHandle = GCHandle.Alloc(SubstrateDescription.neuronInputs, GCHandleType.Pinned);
 SubstrateDescription.connectionHandle = GCHandle.Alloc(SubstrateDescription.outputConnections, GCHandleType.Pinned);
calcHandle = GCHandle.Alloc(calcOrder, GCHandleType.Pinned);

Then call

GenerateSubstrateConnections(
SubstrateDescription.inputHandle.AddrOfPinnedObject(), 
calcHandle.AddrOfPinnedObject(),
SubstrateDescription.connectionHandle.AddrOfPinnedObject());

I'm not entirely sure if this is necessary, but I know that it works (currently). Thank you for all the comments, they helped me squeeze out the issue.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌生 2024-09-25 19:17:21

我什至不确定您是否可以对 CUDA 函数执行简单的 pInvoke，因为它们不在主处理器上运行。直接使用本机 CUDA API 的最佳选择可能是使用 C++/CLI。 nVidia 刚刚为此发布了一个支持包。
其他更简单的选项包括使用 OPENCL，它具有名为 OpenTK 的 .Net 库，它为大多数用途提供托管包装器。

回复收藏 0 原文