使用CUDA时如何测量每个块的执行时间?

发布于 2024-09-15 21:19:29 字数 20 浏览 3 评论 0原文

Clock() 不够准确。

clock() is not accurate enough.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一枫情书 2024-09-22 21:19:29

使用 CUDA 事件来测量内核或 CUDA 操作(memcpy 等)的时间:

// Prepare
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
// Start record
cudaEventRecord(start, 0);
// Do something on GPU
MyKernel<<<dimGrid, dimBlock>>>(input_data, output_data);
// Stop event
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime, start, stop); // that's our time!
// Clean up:
cudaEventDestroy(start);
cudaEventDestroy(stop);

请参阅 CUDA 编程指南,第 3.2.7.6 节

Use CUDA events for measure time of kernels or CUDA operations (memcpy etc):

// Prepare
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
// Start record
cudaEventRecord(start, 0);
// Do something on GPU
MyKernel<<<dimGrid, dimBlock>>>(input_data, output_data);
// Stop event
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime, start, stop); // that's our time!
// Clean up:
cudaEventDestroy(start);
cudaEventDestroy(stop);

See CUDA Programming Guide, section 3.2.7.6

死开点丶别碍眼 2024-09-22 21:19:29

如何在每个 CUDA 线程中使用 Clock() 函数来计算开始和结束时间。并将其存储在数组中,这样您就可以根据数组索引确定哪个线程在何时启动/停止,如下所示:

__global__ void kclock(unsigned int *ts) {
    unsigned int start_time = 0, stop_time = 0;

    start_time = clock();

    // Code we need to measure should go here.

    stop_time = clock();

    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2] = start_time;
    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2 + 1] = stop_time; 
} 

然后使用此数组计算出您正在考虑的块的最小启动时间和最大停止时间。例如,您可以计算与 CUDA 中的 (0, 0) 块相对应的时间数组的索引范围,并使用最小/最大来计算执行时间。

How about using clock() function in every CUDA thread to calculate start and end times. And store it in a array such a way that you can figure out which thread start/stop at which time based on array indices like following:

__global__ void kclock(unsigned int *ts) {
    unsigned int start_time = 0, stop_time = 0;

    start_time = clock();

    // Code we need to measure should go here.

    stop_time = clock();

    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2] = start_time;
    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2 + 1] = stop_time; 
} 

Then use this array to figure out minimal start time and maximum stop time for block you are considering. For example you can calculate range of indices of time array which corresponds to the (0, 0) block in CUDA and use min/max to calculate the execution time.

⒈起吃苦の倖褔 2024-09-22 21:19:29

我认为 long long int Clock64() 是您正在寻找的?

请参阅 Cuda 编程指南,C 语言扩展,B.11。

I think long long int clock64() is what you are looking for?

See Cuda Programming Guide, C Language Extensions, B. 11.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文