为什么 OpenCL 矢量加法 Nvidia SDK 示例使用异步写入?

发布于 2024-09-28 11:09:00 字数 834 浏览 7 评论 0原文

矢量加法示例有以下代码:

// Asynchronous write of data to GPU device
ciErr1 = clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcA, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcA, 0, NULL, NULL);
ciErr1 |= clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcB, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcB, 0, NULL, NULL);
shrLog("clEnqueueWriteBuffer (SrcA and SrcB)...\n"); 
if (ciErr1 != CL_SUCCESS)
{
    shrLog("Error in clEnqueueWriteBuffer, Line %u in file %s !!!\n\n", __LINE__, __FILE__);
    Cleanup(EXIT_FAILURE);
}

// Launch kernel
ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, NULL);
shrLog("clEnqueueNDRangeKernel (VectorAdd)...\n"); 
if (ciErr1 != CL_SUCCESS)

它随后立即启动内核。这怎么能不引起问题呢?我们不能保证内核启动时图形内存缓冲区已被完全写入,对吧?

The vector addition example has this code:

// Asynchronous write of data to GPU device
ciErr1 = clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcA, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcA, 0, NULL, NULL);
ciErr1 |= clEnqueueWriteBuffer(cqCommandQueue, cmDevSrcB, CL_FALSE, 0, sizeof(cl_float) * szGlobalWorkSize, srcB, 0, NULL, NULL);
shrLog("clEnqueueWriteBuffer (SrcA and SrcB)...\n"); 
if (ciErr1 != CL_SUCCESS)
{
    shrLog("Error in clEnqueueWriteBuffer, Line %u in file %s !!!\n\n", __LINE__, __FILE__);
    Cleanup(EXIT_FAILURE);
}

// Launch kernel
ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, NULL);
shrLog("clEnqueueNDRangeKernel (VectorAdd)...\n"); 
if (ciErr1 != CL_SUCCESS)

It launches the kernel right afterwards. How does this not cause problems? We aren't guaranteeing that the graphics memory buffers have been fully written to when the kernel launches right?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

朦胧时间 2024-10-05 11:09:00

虽然从主机的角度来看写入是异步的,但从设备的角度来看它们不一定是异步的。我假设命令队列是在没有 CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE 的情况下创建的,因此它是一个有序命令队列。

opencl 规范对于按顺序执行有以下规定:

按顺序执行:命令按照它们在命令中出现的顺序启动 -
排队并按顺序完成。换句话说,队列上的先前命令完成
在以下命令开始之前。这会序列化命令的执行顺序
队列。

因此,写入应该在设备上执行内核之前完成。

While the writes are asynchronous from a host's point of view, they aren't necessarily asynchroneous from the device's point of view. I'd assume that the commandqueue is created without CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, so it's an in-order commandqueue.

The opencl specification says the following about in-order execution:

In-order Execution: Commands are launched in the order they appear in the command-
queue and completed in order. In other words, a prior command on the queue completes
before the following command begins. This serializes the execution order of commands in a
queue.

Therefore the writes should complete before the kernel is executed on the device.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文