比较 CPU 和 GPU 内存范围的规范方法是什么

发布于 2025-01-14 15:45:24 字数 470 浏览 0 评论 0原文

我必须连续的范围(指针+大小),一个在GPU中,一个在CPU中,我想比较它们是否相等。

比较这些范围是否相等的规范方法是什么?

my_cpu_type cpu;  // cpu.data() returns double*
my_gpu_type gpu;  // gpu.data() returns thrust::cuda::pointer<double>

thrust::equal(cpu.data(), cpu.data() + cpu.size(), gpu.data());

给出非法内存访问。 我也尝试过

thrust::equal(
   thrust::cuda::par // also thrust::host
   , cpu.data(), cpu.data() + cpu.size(), gpu.data()
);

I have to contiguous ranges (pointer + size), one in the GPU and one in the CPU and I want to compare if they are equal.

What the canonical way to compare these ranges for equality?

my_cpu_type cpu;  // cpu.data() returns double*
my_gpu_type gpu;  // gpu.data() returns thrust::cuda::pointer<double>

thrust::equal(cpu.data(), cpu.data() + cpu.size(), gpu.data());

gives illegal memory access.
I also tried

thrust::equal(
   thrust::cuda::par // also thrust::host
   , cpu.data(), cpu.data() + cpu.size(), gpu.data()
);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

第几種人 2025-01-21 15:45:24

在一般情况下,你不能按照你想象的推力方式做到这一点。 Thrust 不在混合后端中执行算法。您必须使用设备后端,在这种情况下,所有数据都需要位于设备上(或从设备代码访问,见下文),或者使用主机后端,在这种情况下,所有数据都需要位于主机上。

因此,您将被迫将数据从一侧复制到另一侧。成本应该相似(将主机阵列复制到设备,或将设备阵列复制到主机),因此我们更喜欢复制到设备,因为设备比较可以更快。

如果您有幸将主机阵列放在固定缓冲区中,那么就可以执行您所建议的操作。

对于一般情况,类似这样的事情应该有效:

thrust::host_vector<double>   cpu(size);
thrust::device_vector<double> gpu(size);

thrust::device_vector<double> d_cpu = cpu;
bool are_equal = thrust::equal(d_cpu.begin(), d_cpu.end(), gpu.begin());

You can't do it the way you are imagining in the general case with thrust. Thrust does not execute algorithms in a mixed backend. You must either use the device backend, in which case all data needs to be on the device (or accessible from device code, see below), or else the host backend in which case all data needs to be on the host.

Therefore you will be forced to copy the data from one side to the other. The cost should be similar (copy host array to device, or device array to host) so we prefer to copy to the device, since the device comparison can be faster.

If you have the luxury of having the host array be in a pinned buffer, then it will be possible to do something like what you are suggesting.

For the general case, something like this should work:

thrust::host_vector<double>   cpu(size);
thrust::device_vector<double> gpu(size);

thrust::device_vector<double> d_cpu = cpu;
bool are_equal = thrust::equal(d_cpu.begin(), d_cpu.end(), gpu.begin());
童话里做英雄 2025-01-21 15:45:24

除了 Robert 的有效答案之外,我还认为您在尝试使用涉及 GPU 计算的类似 C++-STL 的代码时走的是错误的道路。

问题不仅仅是指针指向哪里的问题。像 std::equal 这样的东西本质上是顺序的。即使其实现涉及并行性,假设仍然是尽快启动计算,阻塞调用线程,并将结果返回给该调用线程以继续其工作。虽然这可能是您想要的,但我猜在大多数情况下,它可能不是您想要的。我认为推力的方法,即让开发人员感觉他们正在编写“C++ STL 代码,但使用 GPU”(大部分)是被误导的。

如果有 GPU 任务图、C++ future/async/promise 机制的某种集成,也许还有类似 taskflow< /a> 或其他框架,这可能以某种方式变得更像是一种“规范”的方式来做到这一点。

In addition to Robert's valid answer, I would claim you are following the wrong path in trying to employ C++-STL-like code where GPU computation is involved.

The issue is not merely that of where pointers point to. Something like std::equal is inherently sequential. Even if its implementation involves parallelism, the assumption is still of a computation which is to start ASAP, blocking the calling thread, and returning a result to that calling thread to continue its work. While it's possible this is what you want, I would guess that in most cases, it probably isn't. I believe thrust's approach, of making developers feel as though they're writing "C++ STL code, but with the GPU" is (mostly) misguided.

If there had been some integration of GPU task graphs, the C++ future/async/promise mechanism, and perhaps something like taskflow or other frameworks, that might have somehow become more of a "canonical" way to do this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文