我目前正在为一组非常异构的计算机(具体使用 JavaCL)开发 OpenCL 应用程序。为了最大限度地提高性能,如果可用,我想使用 GPU,否则我想退回到 CPU 并使用 SIMD 指令。我的计划是使用向量类型实现 OpenCL 代码,因为我的理解是这允许 CPU 对指令进行向量化并使用 SIMD 指令。
然而,我的问题是使用哪种 OpenCL 实现。例如,如果计算机有 Nvidia GPU,我认为最好使用 Nvidia 的库,但如果没有可用的 GPU,我想使用 Intel 的库来使用 SIMD 指令。
我该如何实现这一目标?这是自动处理的还是我必须包含所有库并实现一些逻辑来选择正确的库?感觉这个问题是比我更多的人面临的问题。
更新
在测试了不同的 OpenCL 驱动程序之后,这是我迄今为止的经验:
-
Intel:当 JavaCL 尝试调用它时,JVM 崩溃了。重新启动后,它没有使 JVM 崩溃,但也没有返回任何可用的信息
设备(我使用的是 Intel I7-CPU)。当我编译的时候
OpenCL-代码离线它似乎可以做一些
自动矢量化,所以英特尔的编译器看起来相当不错。
-
Nvidia:拒绝安装他们的 WHQL 驱动程序,因为它声称我没有 Nvidia 卡(该计算机有 Geforce GT 330M)。什么时候
我在另一台计算机上尝试过,我设法一路到达
创建一个内核,但在第一次执行时它使驱动程序崩溃
(屏幕闪烁了一会,Windows 7说必须
重新启动驱动程序)。第二次执行导致蓝屏
-
AMD/ATI:拒绝安装 32 位 SDK(我尝试过安装,因为我将使用 32 位 JVM),但 64 位 SDK 运行良好。这是唯一的
我已经设法在其上执行代码的驱动程序(重新启动后
因为一开始它在编译时给出了一条神秘的错误消息)。
但是它似乎无法进行任何隐式矢量化
由于我没有任何 ATI GPU,因此我没有获得任何性能
与 Java 实现相比有所增加。如果我使用向量类型 I
不过可能会看到一些改进。
TL;DR 似乎没有一个驱动程序适合商业用途。我可能更擅长使用编译为使用 SSE 指令的 C 代码来创建 JNI 模块。
I'm currently developing an OpenCL-application for a very heterogeneous set of computers (using JavaCL to be specific). In order to maximize performance I want to use a GPU if it's available otherwise I want to fall back to the CPU and use SIMD-instructions. My plan is to implement the OpenCL-code using vector-types because my understanding is that this allows CPUs to vectorize the instructions and use SIMD-instructions.
My question however is regarding which OpenCL-implementation to use. E.g. if the computer has a Nvidia GPU I assume it's best to use Nvidia's library but if no GPU is available I want to use Intel's library to use the SIMD-instructions.
How do I achieve this? Is this handled automatically or do I have to include all libraries and implement some logic to pick the right one? It feels like this is a problem that more people than I are facing.
Update
After testing the different OpenCL-drivers this is my experience so far:
-
Intel: crashed the JVM when JavaCL tried to call it. After a restart it didn't crash the JVM but it also didn't return any usable
devices (I was using an Intel I7-CPU). When I compiled the
OpenCL-code offline it seemed to be able to do some
auto-vectorization so Intel's compiler seems quite nice.
-
Nvidia: Refused to install their WHQL-drivers because it claimed I didn't have Nvidia-card (that computer has a Geforce GT 330M). When
I tried it on a different computer I managed to get all the way to
create a kernel but at the first execution it crashed the drivers
(the screen flickered for a while and Windows 7 said it had to
restart the drivers). The second execution caused a bluee-screen of
death.
-
AMD/ATI: Refused to install 32-bit SDK (I tried that since I will be using a 32-bit JVM) but 64-bit SDK worked well. This is the only
driver which I've managed to execute the code on (after a restart
because at first it gave a cryptic error-message when compiling).
However it doesn't seem to be able to do any implicit vectorization
and since I don't have any ATI GPU I didn't get any performance
increase compared to the Java-implementation. If I use vector-types I
might see some improvements though.
TL;DR None of the drivers seem ready for commercial use. I'm probably better of creating JNI-module with C-code compiled to use SSE-instructions.
发布评论
评论(3)
首先尝试了解主机和主机设备:http://www.streamcomputing。 eu/blog/2011-07-14/basic-concept-hosts-and-devices/
基本上,您可以完全按照您所描述的操作:检查某个驱动程序是否可用,如果不可用,请尝试下一张。您首先选择什么完全取决于您自己的喜好。我会选择我在其上测试过内核最好的设备。在 JavaCL 中,您可以使用 JavaCL.createBestContext 和 CLPlatform.getBestDevice 选择最快的设备,请在此处检查主机代码: http://ochafik.com/blog/?p=501
了解 NVidia 不通过其驱动程序支持 CPU;只有 AMD 和英特尔可以。另外,针对多个设备(例如 2 个 GPU 和一个 CPU)也有点困难。
First try to understand hosts & devices: http://www.streamcomputing.eu/blog/2011-07-14/basic-concept-hosts-and-devices/
Basically you can just do exactly what you described: check if a certain driver is available and if not, try the next one. What you choose first depends completely on your own preference. I would pick the device I have tested my kernel best on. In JavaCL you can pick the fastest device with JavaCL.createBestContext and CLPlatform.getBestDevice, check the host-code here: http://ochafik.com/blog/?p=501
Know NVidia does not support CPUs via their driver; only AMD and Intel do. Also is targeting multiple devices (say 2 GPUs and a CPU) a bit more difficult.
没有API可以提供你想要的东西。但是,您可以执行以下操作:
我建议您迭代 clGetPlatformIDs 并查询设备数量 (clGetDeviceIDs) 以及每个设备的设备类型;
并选择同时具有这两种类型的平台。
然后在您的代码中构建一个映射,该映射为每种类型映射支持它的平台列表,并以某种方式排序。
最后,只需获取 CL_DEVICE_TYPE_CPU 对应的列表中的第一项和 CL_DEVICE_TYPE_GPU 对应的第一项即可。
如果两个返回的结果相等(platform_cpu == platform_gpu),则选择其中之一并将其用于两者。
如果有一个平台支持两者,您将像以前一样获得匹配,因为您有订单列表。如果您喜欢在单个平台上(例如英特尔的平台),那么您还可以进行负载平衡。
There is no API providing what you want. however, you can do the following:
i suggest you iterate over clGetPlatformIDs and query for the number of devices (clGetDeviceIDs), and device type for each device;
and pick the platform which has both types.
then build a map in u'r code, that maps for each type the list of platforms supporting it, ordered in some manner.
finally, just get the first item in the list corresponding for CL_DEVICE_TYPE_CPU and the first item corresponding for CL_DEVICE_TYPE_GPU.
if both returned results are equal (platform_cpu == platform_gpu) then pick one of them and use it for both.
if there is a platform supporting both, you will get match as before since you got order lists. then u can also do load balancing if u like on a single platform, like what Intel has.
抱歉来晚了,但是关于 Intel 在 JavaCL 下的实现行为,恐怕您已经被 JavaCL bug 困扰了:
https://github.com/ochafik/nativelibs4java/issues/297
已修复 JavaCL 1.0.0-RC2!
干杯
Sorry for being late to the party, but regarding Intel's implementation behaviour under JavaCL, I'm afraid you've been bitten by a JavaCL bug :
https://github.com/ochafik/nativelibs4java/issues/297
Fixed in JavaCL 1.0.0-RC2 !
Cheers