CUDA.NET 中的上下文迁移
我目前正在使用 GASS 的 CUDA.NET 库。 我需要在一个CPU线程中初始化cuda数组(实际上是cublas向量,但这并不重要)并在其他CPU线程中使用它们。但是,保存所有初始化数组和加载函数的 CUDA 上下文只能附加到一个 CPU 线程。
有一种称为上下文迁移 API 的机制可以将上下文从一个线程分离并将其附加到另一个线程。但我不知道如何在 CUDA.NET 中正确使用它。
我尝试了这样的操作:
class Program
{
private static float[] vector1, vector2;
private static CUDA cuda;
private static CUBLAS cublas;
private static CUdeviceptr ptr;
static void Main(string[] args)
{
cuda = new CUDA(false);
cublas = new CUBLAS(cuda);
cuda.Init();
cuda.CreateContext(0);
AllocateVectors();
cuda.DetachContext();
CUcontext context = cuda.PopCurrentContext();
GetVectorFromDeviceAsync(context);
}
private static void AllocateVectors()
{
vector1 = new float[]{1f, 2f, 3f, 4f, 5f};
ptr = cublas.Allocate(vector1.Length, sizeof (float));
cublas.SetVector(vector1, ptr);
vector2 = new float[5];
}
private static void GetVectorFromDevice(object objContext)
{
CUcontext localContext = (CUcontext) objContext;
cuda.PushCurrentContext(localContext);
cuda.AttachContext(localContext);
//change vector somehow
vector1[0] = -1;
//copy changed vector to device
cublas.SetVector(vector1, ptr);
cublas.GetVector(ptr, vector2);
CUDADriver.cuCtxPopCurrent(ref localContext);
}
private static void GetVectorFromDeviceAsync(CUcontext cUcontext)
{
Thread thread = new Thread(GetVectorFromDevice);
thread.IsBackground = false;
thread.Start(cUcontext);
}
}
但是尝试将更改的向量复制到设备时执行失败,因为未附加上下文。其他原因不太可能,因为它在单线程模式下工作得很好。有什么想法可以让它发挥作用吗?
I'm currently using CUDA.NET library by GASS.
I need to initialize cuda arrays (actually cublas vectors, but it doesn't matters) in one CPU thread and use them in other CPU thread. But CUDA context which holding all initialized arrays and loaded functions, can be attached only to one CPU thread.
There is mechanism called context migration API to detach context from one thread and attach it to another. But i don't how to properly use it in CUDA.NET.
I tried something like this:
class Program
{
private static float[] vector1, vector2;
private static CUDA cuda;
private static CUBLAS cublas;
private static CUdeviceptr ptr;
static void Main(string[] args)
{
cuda = new CUDA(false);
cublas = new CUBLAS(cuda);
cuda.Init();
cuda.CreateContext(0);
AllocateVectors();
cuda.DetachContext();
CUcontext context = cuda.PopCurrentContext();
GetVectorFromDeviceAsync(context);
}
private static void AllocateVectors()
{
vector1 = new float[]{1f, 2f, 3f, 4f, 5f};
ptr = cublas.Allocate(vector1.Length, sizeof (float));
cublas.SetVector(vector1, ptr);
vector2 = new float[5];
}
private static void GetVectorFromDevice(object objContext)
{
CUcontext localContext = (CUcontext) objContext;
cuda.PushCurrentContext(localContext);
cuda.AttachContext(localContext);
//change vector somehow
vector1[0] = -1;
//copy changed vector to device
cublas.SetVector(vector1, ptr);
cublas.GetVector(ptr, vector2);
CUDADriver.cuCtxPopCurrent(ref localContext);
}
private static void GetVectorFromDeviceAsync(CUcontext cUcontext)
{
Thread thread = new Thread(GetVectorFromDevice);
thread.IsBackground = false;
thread.Start(cUcontext);
}
}
But execution fails on attempt to copy changed vector to device because context is not attached. Other reasons are unlikely, because it works fine in single threaded mode. Any ideas how i can get it work?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我仍然没有找到这个问题的解决方案,但我确实想出了一个解决方法。
重点是在一个 CPU 线程中执行所有与 CUDA 相关的函数。
例如,你可以这样做:
我希望它能帮助某人。
I still have not found a solution for this problem but i did came up with a workaround.
The point is to execute all the functions which have something to deal with CUDA in one CPU thread.
For example, you can do it like this:
I hope it's gonna help someone.
查看 GASS 文档中的 CUDAContextSynchronizer 类。
Check out CUDAContextSynchronizer class in GASS documentation.