我实际上可以在cuda卡上分配多少内存
我正在编写一个使用 cuda 在 GPU 上执行计算的服务器进程。我想对传入请求进行排队,直到设备上有足够的内存来运行作业,但我很难计算出可以在设备上分配多少内存。我对作业需要多少内存有一个很好的估计(至少从 cudaMalloc() 分配多少内存),但在我分配可用的全局内存总量之前,我就已经发现设备内存不足了。
是否有一些公式可以根据全局内存总量计算我可以分配的数量?我可以使用它,直到我得到一个可以凭经验工作的估计,但我担心我的客户会在某个时候部署不同的卡,而我的偷工减料的数字不会很好地工作。
I'm writing a server process that performs calculations on a GPU using cuda. I want to queue up in-coming requests until enough memory is available on the device to run the job, but I'm having a hard time figuring out how much memory I can allocate on the the device. I have a pretty good estimate of how much memory a job requires, (at least how much will be allocated from cudaMalloc()), but I get device out of memory long before I've allocated the total amount of global memory available.
Is there some king of formula to compute from the total global memory the amount I can allocated? I can play with it until I get an estimate that works empirically, but I'm concerned my customers will deploy different cards at some point and my jerry-rigged numbers won't work very well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
GPU DRAM 的大小是您可以通过 cudaMalloc 分配的内存量的上限,但不能保证 CUDA 运行时可以在一次大分配中满足对所有内存的请求,甚至一系列小额分配。
内存分配的约束根据操作系统底层驱动程序模型的细节而变化。例如,如果相关 GPU 是主显示设备,则操作系统可能还为图形保留了部分 GPU 内存。运行时使用的其他隐式状态(例如堆)也会消耗内存资源。内存也可能已变得碎片化,并且不存在足够大的连续块来满足请求。
CUDART API 函数
cudaMemGetInfo
报告可用内存的空闲量和总量。据我所知,没有类似的 API 调用可以报告最大可满足分配请求的大小。The size of your GPU's DRAM is an upper bound on the amount of memory you can allocate through
cudaMalloc
, but there's no guarantee that the CUDA runtime can satisfy a request for all of it in a single large allocation, or even a series of small allocations.The constraints of memory allocation vary depending on the details of the underlying driver model of the operating system. For example, if the GPU in question is the primary display device, then it's possible that the OS has also reserved some portion of the GPU's memory for graphics. Other implicit state the runtime uses (such as the heap) also consumes memory resources. It's also possible that the memory has become fragmented and no contiguous block large enough to satisfy the request exists.
The CUDART API function
cudaMemGetInfo
reports the free and total amount of memory available. As far as I know, there's no similar API call which can report the size of the largest satisfiable allocation request.