加强托管线程和操作系统线程之间的关系(CUDA 用例)
问题
我正在尝试创建一个与.net 良好集成的 CUDA 应用程序。设计目标是拥有多个可以从托管代码调用的 CUDA 函数。数据还应该能够在函数调用之间保留在设备上,以便可以将其传递给多个 CUDA 函数。
重要的是,每个单独的数据只能由单个操作系统线程访问(根据 CUDA 的要求)
我的策略
我将 CUDA 功能和设备指针包装在托管 C++ 代码中。 CUDA 设备指针可以包装在用 MC++ 编写的 DevicePointer
类中。如果该类跟踪它正在使用哪个线程,它可以强制只有单个线程可以访问 CUDA 设备指针。
然后,我将设计该程序,以便只有一个线程会尝试访问任何给定的数据。
我需要帮助的地方
我做了一些研究,并了解了托管线程和操作系统线程之间的区别。一般来说,两者之间似乎存在多对多的关系。
这意味着即使我只使用单个托管线程,它也可以切换操作系统线程,并且我将失去对设备指针的访问。
有没有办法强制 CLR 不在操作系统线程之间移动托管线程?
Problem
I'm trying to create an CUDA application that is well integrated with .net. The design goal is to have several CUDA functions that can be called from managed code. Data should also be able to persist on a device between function calls, so that it can be passed to multiple CUDA functions.
It is of importance that each individual piece of data is only accessed by a single OS thread (as required by CUDA)
My Strategy
I'm wrapping CUDA functionalities and device pointers in Managed C++ code. A CUDA device pointer can be wrapped in a DevicePointer
class written in MC++. If the class tracks which thread it is using, it can enforce that only a single thread can access the CUDA device pointer.
I'll then design the program so that only a single thread would attempt to access any given piece of data.
Where I need help
I've done some research, and read about the distinction between managed threads and OS threads. It seems that there is, in general, a many to many relationship between the two.
This means that even though I'm only using a single managed thread, it could switch OS threads, and I'll loose access to a device pointer.
Is there any way to force the CLR to not move a managed thread between OS threads?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用
BeginThreadAffinity
和EndThreadAffinity
< /a> 方法:Use the
BeginThreadAffinity
andEndThreadAffinity
methods :我怀疑你是否需要做任何事情。
IIRC,“操作系统线程切换”意味着操作系统可以将线程从一个处理器核心移动到另一个处理器核心(甚至多插槽系统中的另一个处理器),而据称它认为这会提高性能。
但 Cuda 并不真正关心哪个处理器核心/“操作系统线程”正在运行代码。只要一次只有一个托管线程可以访问数据,就不应该出现任何竞争条件。
通常,只有当有人彻底了解从不同内核访问 CPU 内存位置的性能差异时,才会使用线程关联 API。但是,由于您的持久数据(我假设)位于 GPU 纹理缓冲区中,而不是位于 CPU 内存中,因此即使如此也是无关紧要的。
I doubt that you need to do anything.
IIRC, the "OS thread switch" means that the OS can move the thread from one processor core to another (or even to another processor in multi-socket systems) when in it's alledged wisdom it thinks that would improve performance.
But Cuda doesn't really care which processor core/"OS thread" is running the code. As long as only one managed thread at a time can access the data there shouldn't be any race condition.
The thread affinity APIs are generally only used when someone gets totally anal about the difference in performance in accessing CPU memory locatations from different cores. But since your persistent data is (I assume) in GPU texture buffers and not in CPU memory, even that is irrelevant.