用于多个 GPU 的 cudaDeviceReset
我目前正在开发一个具有 4 个 Tesla T10 GPU 的 GPU 服务器。当我不断测试内核并且必须经常使用 ctrl-C 终止进程时,我在简单的设备查询代码的末尾添加了几行。代码如下:
#include <stdio.h>
// Print device properties
void printDevProp(cudaDeviceProp devProp)
{
printf("Major revision number: %d\n", devProp.major);
printf("Minor revision number: %d\n", devProp.minor);
printf("Name: %s\n", devProp.name);
printf("Total global memory: %u\n", devProp.totalGlobalMem);
printf("Total shared memory per block: %u\n", devProp.sharedMemPerBlock);
printf("Total registers per block: %d\n", devProp.regsPerBlock);
printf("Warp size: %d\n", devProp.warpSize);
printf("Maximum memory pitch: %u\n", devProp.memPitch);
printf("Maximum threads per block: %d\n", devProp.maxThreadsPerBlock);
for (int i = 0; i < 3; ++i)
printf("Maximum dimension %d of block: %d\n", i, devProp.maxThreadsDim[i]);
for (int i = 0; i < 3; ++i)
printf("Maximum dimension %d of grid: %d\n", i, devProp.maxGridSize[i]);
printf("Clock rate: %d\n", devProp.clockRate);
printf("Total constant memory: %u\n", devProp.totalConstMem);
printf("Texture alignment: %u\n", devProp.textureAlignment);
printf("Concurrent copy and execution: %s\n", (devProp.deviceOverlap ? "Yes" : "No"));
printf("Number of multiprocessors: %d\n", devProp.multiProcessorCount);
printf("Kernel execution timeout: %s\n", (devProp.kernelExecTimeoutEnabled ? "Yes" : "No"));
return;
}
int main()
{
// Number of CUDA devices
int devCount;
cudaGetDeviceCount(&devCount);
printf("CUDA Device Query...\n");
printf("There are %d CUDA devices.\n", devCount);
// Iterate through devices
for (int i = 0; i < devCount; ++i)
{
// Get device properties
printf("\nCUDA Device #%d\n", i);
cudaDeviceProp devProp;
cudaGetDeviceProperties(&devProp, i);
printDevProp(devProp);
}
printf("\nPress any key to exit...");
char c;
scanf("%c", &c);
**for (int i = 0; i < devCount; i++) {
cudaSetDevice(i);
cudaDeviceReset();
}**
return 0;
}
我的查询与 main() 结束之前的 for 循环相关,其中我一一设置每个设备,然后使用 cudaResetDevice 命令。我有一种奇怪的感觉,这段代码虽然没有产生任何错误,但我无法重置所有设备。相反,程序每次仅重置默认设备,即设备 0。谁能告诉我应该怎么做才能重置这 4 个设备。
谢谢
I am currently working on a gpu server which has 4 Tesla T10 gpu's. While I keep testing the kernels and have to frequently kill the processes using ctrl-C, I added a few lines to the end of a simple device query code. The code is given below :
#include <stdio.h>
// Print device properties
void printDevProp(cudaDeviceProp devProp)
{
printf("Major revision number: %d\n", devProp.major);
printf("Minor revision number: %d\n", devProp.minor);
printf("Name: %s\n", devProp.name);
printf("Total global memory: %u\n", devProp.totalGlobalMem);
printf("Total shared memory per block: %u\n", devProp.sharedMemPerBlock);
printf("Total registers per block: %d\n", devProp.regsPerBlock);
printf("Warp size: %d\n", devProp.warpSize);
printf("Maximum memory pitch: %u\n", devProp.memPitch);
printf("Maximum threads per block: %d\n", devProp.maxThreadsPerBlock);
for (int i = 0; i < 3; ++i)
printf("Maximum dimension %d of block: %d\n", i, devProp.maxThreadsDim[i]);
for (int i = 0; i < 3; ++i)
printf("Maximum dimension %d of grid: %d\n", i, devProp.maxGridSize[i]);
printf("Clock rate: %d\n", devProp.clockRate);
printf("Total constant memory: %u\n", devProp.totalConstMem);
printf("Texture alignment: %u\n", devProp.textureAlignment);
printf("Concurrent copy and execution: %s\n", (devProp.deviceOverlap ? "Yes" : "No"));
printf("Number of multiprocessors: %d\n", devProp.multiProcessorCount);
printf("Kernel execution timeout: %s\n", (devProp.kernelExecTimeoutEnabled ? "Yes" : "No"));
return;
}
int main()
{
// Number of CUDA devices
int devCount;
cudaGetDeviceCount(&devCount);
printf("CUDA Device Query...\n");
printf("There are %d CUDA devices.\n", devCount);
// Iterate through devices
for (int i = 0; i < devCount; ++i)
{
// Get device properties
printf("\nCUDA Device #%d\n", i);
cudaDeviceProp devProp;
cudaGetDeviceProperties(&devProp, i);
printDevProp(devProp);
}
printf("\nPress any key to exit...");
char c;
scanf("%c", &c);
**for (int i = 0; i < devCount; i++) {
cudaSetDevice(i);
cudaDeviceReset();
}**
return 0;
}
My query is related to the for loop just before the main() ends in which I set each device one by one and then use cudaResetDevice command. I get a strange feeling that this code, although doesnt produce any error but I am not able to reset all the devices. Instead, the program is resetting only the default device i.e device 0 each time. Can anyone tell me what should I do to reset each of the 4 devices.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看起来您可以向 GPU 程序添加一个函数来捕获 ctrl+c 信号 (SIGINT),并为程序使用的每个设备调用 cudaDeviceReset() 函数。
捕获 SIGINT 时调用函数的示例代码可以在这里找到:
https://stackoverflow.com/a/482725
为您编写的每个 GPU 程序包含这样的代码似乎是一个很好的做法,我也会这样做:-)
我没有时间写出完整详细的答案,所以请阅读其他答案及其评论。
It looks like you can add a function to your GPU programs to catch the ctrl+c signal (SIGINT) and call the cudaDeviceReset() function for each device that was used by the program.
The example code to call a function when SIGINT is caught can be found here:
https://stackoverflow.com/a/482725
It seems like a good practice to include code like this for every GPU program you write, and I will do the same :-)
I don't have time to write up a full detailed answer, so read the other answer and it's comments also.
这可能已经太晚了,但是如果您编写一个信号处理函数,您可以消除内存泄漏并以某种方式重置设备:
...
如果您使用此代码(您甚至可以将第一个片段包含在外部标头,它可以工作,您可以对 ctrl+c 进行 2 级控制:第一次按会停止模拟并正常退出,但应用程序会完成渲染该步骤,如果您按 ctrl+,则可以优雅地停止并获得正确的结果。 c 再次关闭应用程序,释放所有内容 记忆。
This is probably too late but if you write a signal-handler function you can get rid of the memory leaks and reset the device in a sure way:
....
If you use this code (you can even include the first snippet in an external header, it works. You can have 2 levels of control of ctrl+c: the first press stops your simulation and exits normally but the application finishes rendering the step which is great to stop gracefully and have correct results, if you press ctrl+c again it closes the application freeing all memory.
cudaDeviceReset
旨在销毁与运行进程中给定 GPU 上下文关联的资源。一个 CUDA 进程无法重置或以其他方式影响另一进程的上下文。因此,当您修改后的设备查询调用cudaDeviceReset
时,它只会释放它分配的资源,而不是任何其他进程正在使用的资源。cudaDeviceReset
is intended for destroying resources associated with a given GPU context within the process in which it is run. One CUDA process can't reset or otherwise effect the context of another process. So when your modified device query callscudaDeviceReset
, it is only releases resources that it allocated, not those in use by any other process.