OpenGL 与 OpenCL,选择哪个以及为什么?
哪些功能使 OpenCL 能够独特地选择 OpenGL 和 GLSL 进行计算?尽管有与图形相关的术语和不实用的数据类型,OpenGL 是否有任何真正的警告?
例如,可以通过使用其他纹理将 a 渲染到纹理来完成并行函数评估。减少操作可以通过迭代渲染到越来越小的纹理来完成。另一方面,随机写入访问不可能以任何有效的方式进行(唯一的方法是通过纹理驱动的顶点数据渲染三角形)。 OpenCL 可以做到这一点吗?还有什么是 OpenGL 不可能做到的?
What features make OpenCL unique to choose over OpenGL with GLSL for calculations? Despite the graphic related terminology and inpractical datatypes, is there any real caveat to OpenGL?
For example, parallel function evaluation can be done by rendering a to a texture using other textures. Reducing operations can be done by iteratively render to smaller and smaller textures. On the other hand, random write access is not possible in any efficient manner (the only way to do is rendering triangles by texture driven vertex data). Is this possible with OpenCL? What else is possible not possible with OpenGL?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
除了已经存在的答案之外,OpenCL/CUDA 不仅更适合计算领域,而且不会过多抽象底层硬件。通过这种方式,您可以更直接地从共享内存或合并内存访问等内容中受益,否则这些内容将被隐藏在着色器的实际实现中(如果您愿意,着色器本身只不过是一个特殊的 OpenCL/CUDA 内核)。
尽管要从这些事情中获利,您还需要更多地了解内核将在其上运行的特定硬件,但不要尝试使用着色器明确考虑这些事情(即使完全可能)。
一旦您执行了比简单的 1 级 BLAS 例程更复杂的操作,您一定会欣赏 OpenCL/CUDA 的灵活性和通用性。
In addition to the already existing answers, OpenCL/CUDA not only fits more to the computational domain, but also doesn't abstract away the underlying hardware too much. This way you can profit from things like shared memory or coalesced memory access more directly, which would otherwise be burried in the actual implementation of the shader (which itself is nothing more than a special OpenCL/CUDA kernel, if you want).
Though to profit from such things you also need to be a bit more aware of the specific hardware your kernel will run on, but don't try to explicitly take those things into account using a shader (if even completely possible).
Once you do something more complex than simple level 1 BLAS routines, you will surely appreciate the flexibility and genericity of OpenCL/CUDA.
OpenCL是为通用计算而设计的“特性”,而OpenGL则是为图形而设计的。你可以在 GL 中做任何事情(它是图灵完备的),但是你却用螺丝刀的手柄当锤子敲钉子。
此外,OpenCL 不仅可以在 GPU 上运行,还可以在 CPU 和各种专用加速器上运行。
The "feature" that OpenCL is designed for general-purpose computation, while OpenGL is for graphics. You can do anything in GL (it is Turing-complete) but then you are driving in a nail using the handle of the screwdriver as a hammer.
Also, OpenCL can run not just on GPUs, but also on CPUs and various dedicated accelerators.
OpenCL(2.0版本)描述了异构计算环境,其中系统的每个组件都可以生成和生成数据。使用由其他系统组件生成的任务。不再需要 CPU、GPU(等)概念 - 您只需 Host 和 GPU 即可。设备。
OpenGL则相反,对CPU有严格的划分,CPU是任务生产者和任务生产者。 GPU,任务消耗者。这还不错,因为较低的灵活性可以确保更高的性能。 OpenGL 只是范围更窄的工具。
OpenCL (in 2.0 version) describes heterogeneous computational environment, where every component of system can both produce & consume tasks, generated by other system components. No more CPU, GPU (etc) notions are longer needed - you have just Host & Device(s).
OpenGL, in opposite, has strict division to CPU, which is task producer & GPU, which is task consumer. That's not bad, as less flexibility ensures greater performance. OpenGL is just more narrow-scope instrument.
一种想法是用这两种语言编写您的程序,并根据您的优先级对它们进行测试。
例如:如果您正在处理图像管道,也许您在 openGL 或 openCL 中的实现比另一个更快。
祝你好运。
One thought is to write your program in both and test them with respect to your priorities.
For example: If you're processing a pipeline of images, maybe your implementation in openGL or openCL is faster than the other.
Good luck.
OpenCL 是专门为计算而创建的。当您使用 OpenGL 进行科学计算时,您始终必须考虑如何将您的计算问题映射到图形上下文(即用纹理和几何基元(如三角形等)进行讨论),以便让您的计算继续进行。
在 OpenCL 中,您只需使用内存缓冲区上的计算内核来制定计算,然后就可以开始了。这实际上是一个巨大的胜利(从思考和实施这两种变体的角度来看)。
内存访问模式虽然相同(您的计算仍然在 GPU 上进行 - 但 GPU 如今变得越来越灵活)。
但是,除了使用十多个并行“CPU”而不用绞尽脑汁去思考如何转换之外,您还能期待什么 - 例如(愚蠢的例子)傅立叶到三角形和四边形......?
OpenCL is created specifically for computing. When you do scientific computing using OpenGL you always have to think about how to map your computing problem to the graphics context (i.e. talk in terms of textures and geometric primitives like triangles etc.) in order to get your computation going.
In OpenCL you just formulate you computation with a calculation kernel on a memory buffer and you are good to go. This is actually a BIG win (saying that from a perspective of having thought through and implemented both variants).
The memory access patterns are though the same (your calculation still is happening on a GPU - but GPUs are getting more and more flexible these days).
But what else would you expect than using more than a dozen parallel "CPUs" without breaking your head about how to translate - e.g. (silly example) Fourier to Triangles and Quads...?
到目前为止,任何答案中都没有提到执行速度。 如果您的算法可以用 OpenGL 图形表示(例如,没有分散写入、没有本地内存、没有工作组等),那么它通常会比 OpenCL 对应的算法运行得更快。我在这方面的具体经验是在 AMD、nVidia、IMG 和 Qualcomm GPU 上进行图像过滤(收集)内核。即使在核心 OpenCL 内核优化之后,OpenGL 实现也总是运行得更快。 (旁白:我怀疑这是由于多年来硬件和驱动程序专门针对面向图形的工作负载进行了调整。)
我的建议是,如果您的计算程序感觉很好地映射到图形域,那么使用OpenGL。如果不是,OpenCL 更通用,更容易表达计算问题。
另一点需要提及(或询问)的是您是否是作为业余爱好者(即为自己)写作还是商业写作(即分发给他人)。虽然 OpenGL 几乎在所有地方都受到支持,但 OpenCL 完全缺乏移动设备的支持,而且,恕我直言,未来几年内不太可能出现在 Android 或 iOS 上。如果单一代码库的广泛跨平台兼容性是一个目标,那么 OpenGL 可能会强加给您。
Something that hasn't been mentioned in any answers so far has been speed of execution. If your algorithm can be expressed in OpenGL graphics (e.g. no scattered writes, no local memory, no workgroups, etc.) it will very often run faster than an OpenCL counterpart. My specific experience of this has been doing image filter (gather) kernels across AMD, nVidia, IMG and Qualcomm GPUs. The OpenGL implementations invariably run faster even after hardcore OpenCL kernel optimization. (aside: I suspect this is due to years of hardware and drivers being specifically tuned to graphics orientated workloads.)
My advice would be that if your compute program feels like it maps nicely to the graphics domain then use OpenGL. If not, OpenCL is more general and simpler to express compute problems.
Another point to mention (or to ask) is whether you are writing as a hobbyist (i.e. for yourself) or commercially (i.e. for distribution to others). While OpenGL is supported pretty much everywhere, OpenCL is totally lacking support on mobile devices and, imho, is highly unlikely to appear on Android or iOS in the next few years. If wide cross platform compatibility from a single code base is a goal then OpenGL may be forced upon you.
是的:它是一个图形 API。因此,你在其中所做的一切都必须按照这些条款来制定。您必须将数据打包为某种形式的“渲染”。您必须弄清楚如何在属性、统一缓冲区和纹理方面处理数据。
使用 OpenGL 4.3 和 OpenGL ES 3.1 计算着色器,事情变得有点混乱。计算着色器能够通过 SSBO/图像加载/存储以与 OpenCL 计算操作类似的方式访问内存(尽管 OpenCL 提供实际指针,而 GLSL 不提供)。它们与 OpenGL 的互操作也比 OpenCL/GL 互操作快得多。
即便如此,计算着色器也不会改变一个事实:OpenCL 计算操作的运行精度与 OpenGL 的计算着色器非常不同。 GLSL 的浮点精度要求不是很严格,OpenGL ES 的要求更不严格。因此,如果浮点精度对您的计算很重要,那么 OpenGL 将不是计算您需要计算的内容的最有效方法。
此外,OpenGL 计算着色器需要支持 4.x 的硬件,而 OpenCL 可以在更劣质的硬件上运行。
此外,如果您通过选择渲染管道进行计算,OpenGL 驱动程序仍会假设您正在进行渲染。因此它将根据该假设做出优化决策。假设您正在绘制图片,它将优化着色器资源的分配。
例如,如果您要渲染到浮点帧缓冲区,驱动程序可能会决定为您提供 R11_G11_B10 帧缓冲区,因为它检测到您没有对 alpha 执行任何操作,并且您的算法可以容忍较低的精度。但是,如果您使用 图像加载/存储 而不是帧缓冲区,则不太可能得到这个效果。
OpenCL 不是图形 API;它是一个计算 API。
此外,OpenCL 还可以让您访问更多内容。它使您可以访问 GL 隐式的内存级别。某些内存可以在线程之间共享,但 GL 中的单独着色器实例无法直接影响彼此(在图像加载/存储之外,但 OpenCL 在无法访问该实例的硬件上运行)。
OpenGL 将硬件在抽象背后所做的事情隐藏起来。 OpenCL 让您几乎完全了解正在发生的事情。
您可以使用OpenGL 进行任意计算。但您不想这样做;除非有完全可行的替代方案。 OpenGL 中的计算旨在为图形管道提供服务。
选择 OpenGL 进行任何类型的非渲染计算操作的唯一原因是支持无法运行 OpenCL 的硬件。目前,这包括许多移动硬件。
Yes: it's a graphics API. Therefore, everything you do in it has to be formulated along those terms. You have to package your data as some form of "rendering". You have to figure out how to deal with your data in terms of attributes, uniform buffers, and textures.
With OpenGL 4.3 and OpenGL ES 3.1 compute shaders, things become a bit more muddled. A compute shader is able to access memory via SSBOs/Image Load/Store in similar ways to OpenCL compute operations (though OpenCL offers actual pointers, while GLSL does not). Their interop with OpenGL is also much faster than OpenCL/GL interop.
Even so, compute shaders do not change one fact: OpenCL compute operations operate at a very different precision than OpenGL's compute shaders. GLSL's floating-point precision requirements are not very strict, and OpenGL ES's are even less strict. So if floating-point accuracy is important to your calculations, OpenGL will not be the most effective way of computing what you need to compute.
Also, OpenGL compute shaders require 4.x-capable hardware, while OpenCL can run on much more inferior hardware.
Furthermore, if you're doing compute by co-opting the rendering pipeline, OpenGL drivers will still assume that you're doing rendering. So it's going to make optimization decisions based on that assumption. It will optimize the assignment of shader resources assuming you're drawing a picture.
For example, if you're rendering to a floating-point framebuffer, the driver might just decide to give you an R11_G11_B10 framebuffer, because it detects that you aren't doing anything with the alpha and your algorithm could tolerate the lower precision. If you use image load/store instead of a framebuffer however, you're much less likely to get this effect.
OpenCL is not a graphics API; it's a computation API.
Also, OpenCL just gives you access to more stuff. It gives you access to memory levels that are implicit with regard to GL. Certain memory can be shared between threads, but separate shader instances in GL are unable to directly affect one-another (outside of Image Load/Store, but OpenCL runs on hardware that doesn't have access to that).
OpenGL hides what the hardware is doing behind an abstraction. OpenCL exposes you to almost exactly what's going on.
You can use OpenGL to do arbitrary computations. But you don't want to; not while there's a perfectly viable alternative. Compute in OpenGL lives to service the graphics pipeline.
The only reason to pick OpenGL for any kind of non-rendering compute operation is to support hardware that can't run OpenCL. At the present time, this includes a lot of mobile hardware.
一个显着的特征是分散的写入,另一个显着的特征是缺乏“Windows 7 智能”。正如您可能知道的那样,如果 OpenGL 在 2 秒左右没有刷新(请不要告诉我确切的时间,但我认为是 2 秒),Windows 7 将终止显示驱动程序。如果您的操作时间较长,这可能会很烦人。
此外,OpenCL 显然可以与更多种类的硬件配合使用,而不仅仅是显卡,并且它没有带有“人为约束”的严格的面向图形的管道。运行多个并发命令流也更容易(微不足道)。
One notable feature would be scattered writes, another would be the absence of "Windows 7 smartness". Windows 7 will, as you probably know, kill the display driver if OpenGL does not flush for 2 seconds or so (don't nail me down on the exact time, but I think it's 2 secs). This may be annoying if you have a lengthy operation.
Also, OpenCL obviously works with a much greater variety of hardware than just the graphics card, and it does not have a rigid graphics-oriented pipeline with "artificial constraints". It is easier (trivial) to run several concurrent command streams too.
尽管目前 OpenGL 是图形处理的更好选择,但这并不是永久性的。
OpenGL 最终合并为 OpenCL 的扩展可能是可行的。这两个平台大约 80% 相同,但具有不同的语法怪癖,对于大致相同的硬件组件有不同的命名法。这意味着需要学习两种语言,需要了解两种 API。图形驱动程序开发人员更喜欢合并,因为他们不再需要为两个单独的平台进行开发。这为驱动程序调试留下了更多的时间和资源。 ;)
另一件需要考虑的事情是 OpenGL 和 OpenCL 的起源是不同的:OpenGL 在早期的网络固定管道时代开始并获得动力,并随着技术的发展而慢慢地附加和弃用。在某些方面,OpenCL 是 OpenGL 的演变,因为 OpenGL 开始用于数值处理,因为 GPU 的(计划外)灵活性允许这样做。 “图形与计算”实际上更多的是语义争论。在这两种情况下,您总是尝试将数学运算映射到具有尽可能最高性能的硬件。普通 CL 不会使用 GPU 硬件的某些部分,但这不会阻止单独的扩展这样做。
那么OpenGL如何在CL下工作呢?据推测,三角形光栅化器可以作为特殊的 CL 任务排队。特殊的 GLSL 函数可以在 vanilla OpenCL 中实现,然后在内核编译期间由驱动程序覆盖为硬件加速指令。在 OpenCL 中编写着色器,等待提供库扩展,听起来根本不是一种痛苦的经历。
称其中一个比另一个拥有更多的功能并没有多大意义,因为它们都获得了 80% 相同的功能,只是命名法不同。声称 OpenCL 不适合图形,因为它是为计算而设计的,这是没有意义的,因为图形处理就是计算。
Although currently OpenGL would be the better choice for graphics, this is not permanent.
It could be practical for OpenGL to eventually merge as an extension of OpenCL. The two platforms are about 80% the same, but have different syntax quirks, different nomenclature for roughly the same components of the hardware. That means two languages to learn, two APIs to figure out. Graphics driver developers would prefer a merge because they no longer would have to develop for two separate platforms. That leaves more time and resources for driver debugging. ;)
Another thing to consider is that the origins of OpenGL and OpenCL are different: OpenGL began and gained momentum during the early fixed-pipeline-over-a-network days and was slowly appended and deprecated as the technology evolved. OpenCL, in some ways, is an evolution of OpenGL in the sense that OpenGL started being used for numerical processing as the (unplanned) flexibility of GPUs allowed so. "Graphics vs. Computing" is really more of a semantic argument. In both cases you're always trying to map your math operations to hardware with the highest performance possible. There are parts of GPU hardware which vanilla CL won't use but that won't keep a separate extension from doing so.
So how could OpenGL work under CL? Speculatively, triangle rasterizers could be enqueued as a special CL task. Special GLSL functions could be implemented in vanilla OpenCL, then overridden to hardware accelerated instructions by the driver during kernel compilation. Writing a shader in OpenCL, pending the library extensions were supplied, doesn't sound like a painful experience at all.
To call one to have more features than the other doesn't make much sense as they're both gaining 80% the same features, just under different nomenclature. To claim that OpenCL is not good for graphics because it is designed for computing doesn't make sense because graphics processing is computing.
另一个主要原因是 OpenGL\GLSL 仅在显卡上受支持。尽管多核的使用始于使用图形硬件,但仍有许多硬件供应商致力于针对计算的多核硬件平台。例如,请参阅英特尔骑士角。
使用 OpenGL\GLSL 开发计算代码将阻止您使用任何非显卡的硬件。
Another major reason is that OpenGL\GLSL are supported only on graphics cards. Although multi-core usage started with using graphics hardware there are many hardware vendors working on multi-core hardware platform targeted for computation. For example see Intels Knights Corner.
Developing code for computation using OpenGL\GLSL will prevent you from using any hardware that is not a graphics card.
从 OpenGL 4.5 开始,这些是 OpenCL 2.0 具有而 OpenGL 4.5 没有的功能(据我所知)(这不包括 OpenGL 具有而 OpenCL 没有的功能):
事件
更好的原子
块
工作组功能:
work_group_all 和 work_group_any
工作组广播:
工作组减少
work_group_inclusive/exclusive_scan
从内核指针入队内核
(尽管如果您在 GPU 上执行,这可能并不重要)
OpenGL 没有的一些数学函数(尽管您可以在 OpenGL 中自己构建它们)
共享虚拟内存
(更多)内核的编译器选项
轻松选择特定 GPU(或其他)
没有 GPU 时可以在 CPU 上运行
对那些小众硬件平台(例如 FGPA)的更多支持
在某些(全部?)平台上,您不需要窗口(及其上下文绑定)进行计算。
OpenCL 允许对计算精度进行更多控制(包括通过那些编译器选项进行的一些控制)。
上面的很多内容主要是为了更好的 CPU - GPU 交互:事件、共享虚拟内存、指针(尽管这些也可能有利于其他东西)。
自从这里发布了许多其他帖子以来,OpenGL 已经获得了将事物分类到客户端和服务器内存的不同区域的能力。
OpenGL 现在拥有更好的内存屏障和原子支持,并允许您将事物分配到 GPU 内的不同寄存器(与 OpenCL 的程度大致相同)。例如,您现在可以在 OpenGL 中共享本地计算组中的寄存器(使用 AMD GPU LDS(本地数据共享)之类的东西(尽管此特定功能目前仅适用于 OpenGL 计算着色器)。
OpenGL 在某些平台(例如开源 Linux 驱动程序)上具有更强大、性能更高的实现。
OpenGL可以访问更多固定功能的硬件(就像其他答案所说的那样)。虽然有时确实可以避免固定功能硬件(例如 Crytek 使用深度缓冲区的“软件”实现),但固定功能硬件可以很好地管理内存(通常比不使用 GPU 的人要好得多)硬件公司可以)并且在大多数情况下都非常优越。我必须承认 OpenCL 具有相当好的固定功能纹理支持,这是主要的 OpenGL 固定功能领域之一。
我认为 Intels Knights Corner 是一个可以自我控制的 x86 GPU。
我还认为 OpenCL 2.0 及其纹理函数(实际上是 OpenCL 的较小版本)可以达到 user2746401 建议的大致相同的性能程度。
Well as of OpenGL 4.5 these are the features OpenCL 2.0 has that OpenGL 4.5 Doesn't (as far as I could tell) (this does not cover the features that OpenGL has that OpenCL doesn't):
Events
Better Atomics
Blocks
Workgroup Functions:
work_group_all and work_group_any
work_group_broadcast:
work_group_reduce
work_group_inclusive/exclusive_scan
Enqueue Kernel from Kernel
Pointers (though if you are executing on the GPU this probably doesn't matter)
A few math functions that OpenGL doesn't have (though you could construct them yourself in OpenGL)
Shared Virtual Memory
(More) Compiler Options for Kernels
Easy to select a particular GPU (or otherwise)
Can run on the CPU when no GPU
More support for those niche hardware platforms (e.g. FGPAs)
On some (all?) platforms you do not need a window (and its context binding) to do calculations.
OpenCL allows just a bit more control over precision of calculations (including some through those compiler options).
A lot of the above are mostly for better CPU - GPU interaction: Events, Shared Virtual Memory, Pointers (although these could potentially benefit other stuff too).
OpenGL has gained the ability to sort things into different areas of Client and Server memory since a lot of the other posts here have been made.
OpenGL has better memory barrier and atomics support now and allows you to allocate things to different registers within the GPU (to about the same degree OpenCL can). For example you can share registers in the local compute group now in OpenGL (using something like the AMD GPUs LDS (local data share) (though this particular feature only works with OpenGL compute shaders at this time).
OpenGL has stronger more performing implementations on some platforms (such as Open Source Linux drivers).
OpenGL has access to more fixed function hardware (like other answers have said). While it is true that sometimes fixed function hardware can be avoided (e.g. Crytek uses a "software" implementation of a depth buffer) fixed function hardware can manage memory just fine (and usually a lot better than someone who isn't working for a GPU hardware company could) and is just vastly superior in most cases. I must admit OpenCL has pretty good fixed function texture support which is one of the major OpenGL fixed function areas.
I would argue that Intels Knights Corner is a x86 GPU that controls itself.
I would also argue that OpenCL 2.0 with its texture functions (which are actually in lesser versions of OpenCL) can be used to much the same performance degree user2746401 suggested.