当前位置：文江博客话题详情

使用CUDA时如何测量每个块的执行时间？

发布于 2024-09-15 21:19:29 字数 20 浏览 3 评论 0原文

Clock() 不够准确。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一枫情书 2024-09-22 21:19:29

使用 CUDA 事件来测量内核或 CUDA 操作（memcpy 等）的时间：

// Prepare
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
// Start record
cudaEventRecord(start, 0);
// Do something on GPU
MyKernel<<<dimGrid, dimBlock>>>(input_data, output_data);
// Stop event
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime, start, stop); // that's our time!
// Clean up:
cudaEventDestroy(start);
cudaEventDestroy(stop);

请参阅 CUDA 编程指南，第 3.2.7.6 节

Use CUDA events for measure time of kernels or CUDA operations (memcpy etc):

// Prepare
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
// Start record
cudaEventRecord(start, 0);
// Do something on GPU
MyKernel<<<dimGrid, dimBlock>>>(input_data, output_data);
// Stop event
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime, start, stop); // that's our time!
// Clean up:
cudaEventDestroy(start);
cudaEventDestroy(stop);

See CUDA Programming Guide, section 3.2.7.6

回复收藏 0 原文

死开点丶别碍眼 2024-09-22 21:19:29

如何在每个 CUDA 线程中使用 Clock() 函数来计算开始和结束时间。并将其存储在数组中，这样您就可以根据数组索引确定哪个线程在何时启动/停止，如下所示：

__global__ void kclock(unsigned int *ts) {
    unsigned int start_time = 0, stop_time = 0;

    start_time = clock();

    // Code we need to measure should go here.

    stop_time = clock();

    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2] = start_time;
    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2 + 1] = stop_time; 
}

然后使用此数组计算出您正在考虑的块的最小启动时间和最大停止时间。例如，您可以计算与 CUDA 中的 (0, 0) 块相对应的时间数组的索引范围，并使用最小/最大来计算执行时间。

How about using clock() function in every CUDA thread to calculate start and end times. And store it in a array such a way that you can figure out which thread start/stop at which time based on array indices like following:

__global__ void kclock(unsigned int *ts) {
    unsigned int start_time = 0, stop_time = 0;

    start_time = clock();

    // Code we need to measure should go here.

    stop_time = clock();

    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2] = start_time;
    ts[(blockIdx.x * blockDim.x + threadIdx.x) * 2 + 1] = stop_time; 
}

Then use this array to figure out minimal start time and maximum stop time for block you are considering. For example you can calculate range of indices of time array which corresponds to the (0, 0) block in CUDA and use min/max to calculate the execution time.

回复收藏 0 原文