CUDA 和 HPC 中的一维问题
我正在寻找 CUDA 和 HPC 中的一些一维问题,例如 Black Scholes。
我所说的一维问题是指所有工作都在一维数组上完成的问题。虽然矩阵乘法可以用这种方式表达,但我想要的问题是基本问题只是一维的。
我正在尝试为 CUDA 开发一个一维库,并且需要一些基准问题来测试它。我意识到很多现实世界的问题都是用二维表示的,我真的很想看到一些现实世界的一维问题。
谢谢。
编辑:感谢您的所有答案。如果答案包含更多 HPC 问题(例如 Black Scholes),而不仅仅是通用算法,那就太好了。 谢谢。
I'm looking for some 1D problems in CUDA and HPC, e.g. Black Scholes.
By 1D problems, I mean problems in which all the work is done on 1D arrays. Although matrix multiplication can be expressed in this way, I want problems in which the basic problem is just 1D.
I am trying to develop a 1D library for CUDA and would need some benchmark problems to test it. I realize that a lot of real world problems are expressed as 2D, I would really like to see some real world 1D problems.
Thanks.
EDIT: Thanks for all the answers. It'll be great if the answers contain more HPC problems, e.g. Black Scholes, rather than just generic algorithms.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
并行编程中的一个常见问题是减少:给定一个数字数组,您必须计算“前缀和”,即每个元素存储所有前面元素的总和(+本身或不。我更喜欢包含) 。
这是一个相当简单的问题,但由于它经常在更复杂的算法中重复多次,因此高效至关重要。
另一个常见问题是排序。
已经有一些关于该主题的论文,以这篇论文为例:
在此处输入链接描述< /a>
我认为这是一个很好的起点,可以在此基础上解决更大的问题。
A common problem in parallel programing is a reduction: You are given an array of numbers and you have to compute a "prefix sum", that is, every element stores a sum of all preceidings elements (+ itself or not. I prefer inclusive).
It is fairly simple problem, but since it is often repeated many times in more complex algorithms, having that efficient is cruicial.
Another common problem is sorting.
There already some papers on that topic, take this one for example:
enter link description here
I think it is a good problem to start with, to solve bigger problems on top of it.
可以用于 1 到 3 维的一个简单问题是热方程。有几种不同的数值方法可以解决它,其中一些可以并行实现。
至少适用于 OpenMp 和 MPI 的方法是有限差分法。我想如果你将它与一个聪明的模板结合起来,你应该能够在 Cuda C 中有效地实现它。
A simple problem you can use for 1 to 3 dimensions is the heat equation. There are several different numerical methods for solving it, some of them can be implementes in parallel.
A method that works at least with OpenMp and MPI is the finite difference method. I suppose if you combine it with a clever stencil you should be able to implement it efficently in Cuda C.
热方程提供了一个经典的一维示例。
下面,我将利用 Jacobi 解决方案针对该主题发布一个具体的、完全运行的 CPU/GPU 示例。请注意,提供了两个时间步内核,一个不使用共享内存,一个使用共享内存。
A classical 1D example is provided by the heat equation.
Below, I'm posting a concrete, fully worked CPU/GPU example on this topic exploiting the Jacobi solution scheme. Please, note that two time-step kernels are provided, one not using shared memory and one using shared memory.
归约(求数组的最小值、最大值或总和)和排序是一维问题的最佳示例。这些算法可能有很多变量,例如结构排序等
Reduction (finding min, max or sum of array) and Sorting are best examples of 1D problems. There can be many variables of these algorithms like sorting on structures etc