OpenCL 分发

发布于 2024-12-09 14:21:49 字数 1104 浏览 0 评论 0 原文

我目前正在为一组非常异构的计算机（具体使用 JavaCL）开发 OpenCL 应用程序。为了最大限度地提高性能，如果可用，我想使用 GPU，否则我想退回到 CPU 并使用 SIMD 指令。我的计划是使用向量类型实现 OpenCL 代码，因为我的理解是这允许 CPU 对指令进行向量化并使用 SIMD 指令。

然而，我的问题是使用哪种 OpenCL 实现。例如，如果计算机有 Nvidia GPU，我认为最好使用 Nvidia 的库，但如果没有可用的 GPU，我想使用 Intel 的库来使用 SIMD 指令。

我该如何实现这一目标？这是自动处理的还是我必须包含所有库并实现一些逻辑来选择正确的库？感觉这个问题是比我更多的人面临的问题。

更新在测试了不同的 OpenCL 驱动程序之后，这是我迄今为止的经验：

Intel：当 JavaCL 尝试调用它时，JVM 崩溃了。重新启动后，它没有使 JVM 崩溃，但也没有返回任何可用的信息设备（我使用的是 Intel I7-CPU）。当我编译的时候 OpenCL-代码离线它似乎可以做一些自动矢量化，所以英特尔的编译器看起来相当不错。
Nvidia：拒绝安装他们的 WHQL 驱动程序，因为它声称我没有 Nvidia 卡（该计算机有 Geforce GT 330M）。什么时候我在另一台计算机上尝试过，我设法一路到达创建一个内核，但在第一次执行时它使驱动程序崩溃（屏幕闪烁了一会，Windows 7说必须重新启动驱动程序）。第二次执行导致蓝屏
AMD/ATI：拒绝安装 32 位 SDK（我尝试过安装，因为我将使用 32 位 JVM），但 64 位 SDK 运行良好。这是唯一的我已经设法在其上执行代码的驱动程序（重新启动后因为一开始它在编译时给出了一条神秘的错误消息）。但是它似乎无法进行任何隐式矢量化由于我没有任何 ATI GPU，因此我没有获得任何性能与 Java 实现相比有所增加。如果我使用向量类型 I 不过可能会看到一些改进。

TL;DR 似乎没有一个驱动程序适合商业用途。我可能更擅长使用编译为使用 SSE 指令的 C 代码来创建 JNI 模块。

原文

I'm currently developing an OpenCL-application for a very heterogeneous set of computers (using JavaCL to be specific). In order to maximize performance I want to use a GPU if it's available otherwise I want to fall back to the CPU and use SIMD-instructions. My plan is to implement the OpenCL-code using vector-types because my understanding is that this allows CPUs to vectorize the instructions and use SIMD-instructions.

My question however is regarding which OpenCL-implementation to use. E.g. if the computer has a Nvidia GPU I assume it's best to use Nvidia's library but if no GPU is available I want to use Intel's library to use the SIMD-instructions.

How do I achieve this? Is this handled automatically or do I have to include all libraries and implement some logic to pick the right one? It feels like this is a problem that more people than I are facing.

Update
After testing the different OpenCL-drivers this is my experience so far:

Intel: crashed the JVM when JavaCL tried to call it. After a restart it didn't crash the JVM but it also didn't return any usable
devices (I was using an Intel I7-CPU). When I compiled the
OpenCL-code offline it seemed to be able to do some
auto-vectorization so Intel's compiler seems quite nice.
Nvidia: Refused to install their WHQL-drivers because it claimed I didn't have Nvidia-card (that computer has a Geforce GT 330M). When
I tried it on a different computer I managed to get all the way to
create a kernel but at the first execution it crashed the drivers
(the screen flickered for a while and Windows 7 said it had to
restart the drivers). The second execution caused a bluee-screen of
death.
AMD/ATI: Refused to install 32-bit SDK (I tried that since I will be using a 32-bit JVM) but 64-bit SDK worked well. This is the only
driver which I've managed to execute the code on (after a restart
because at first it gave a cryptic error-message when compiling).
However it doesn't seem to be able to do any implicit vectorization
and since I don't have any ATI GPU I didn't get any performance
increase compared to the Java-implementation. If I use vector-types I
might see some improvements though.

TL;DR None of the drivers seem ready for commercial use. I'm probably better of creating JNI-module with C-code compiled to use SSE-instructions.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

戏舞 2024-12-16 14:21:49

首先尝试了解主机和主机设备：http://www.streamcomputing。 eu/blog/2011-07-14/basic-concept-hosts-and-devices/

基本上，您可以完全按照您所描述的操作：检查某个驱动程序是否可用，如果不可用，请尝试下一张。您首先选择什么完全取决于您自己的喜好。我会选择我在其上测试过内核最好的设备。在 JavaCL 中，您可以使用 JavaCL.createBestContext 和 CLPlatform.getBestDevice 选择最快的设备，请在此处检查主机代码： http://ochafik.com/blog/?p=501

了解 NVidia 不通过其驱动程序支持 CPU；只有 AMD 和英特尔可以。另外，针对多个设备（例如 2 个 GPU 和一个 CPU）也有点困难。

回复收藏 0 原文

你如我软肋 2024-12-16 14:21:49

没有API可以提供你想要的东西。但是，您可以执行以下操作：

我建议您迭代 clGetPlatformIDs 并查询设备数量 (clGetDeviceIDs) 以及每个设备的设备类型；
并选择同时具有这两种类型的平台。
然后在您的代码中构建一个映射，该映射为每种类型映射支持它的平台列表，并以某种方式排序。
最后，只需获取 CL_DEVICE_TYPE_CPU 对应的列表中的第一项和 CL_DEVICE_TYPE_GPU 对应的第一项即可。
如果两个返回的结果相等（platform_cpu == platform_gpu），则选择其中之一并将其用于两者。

如果有一个平台支持两者，您将像以前一样获得匹配，因为您有订单列表。如果您喜欢在单个平台上（例如英特尔的平台），那么您还可以进行负载平衡。