GPU YUV 到 RGB。值得付出努力吗?
我必须将几个完整的 PAL 视频 (720x576@25) 从 YUV 4:2:2 实时转换为 RGB,并且可能需要对每个视频进行自定义调整大小。 我曾想过使用 GPU,因为我见过一些这样做的示例(除了它是 4:4:4,因此 bpp 在源和命运中是相同的)-- http://www.fourcc.org/source/YUV420P-OpenGL-GLSLang.c
但是,我不没有任何使用 GPU 的经验,并且我不确定可以做什么。据我了解,该示例只是将视频帧转换为 YUV 并将其显示在屏幕上。
是否可以获取处理后的帧?是否值得付出努力将其发送到 GPU、对其进行转换,然后再次将其发送到主内存,还是会降低性能?
有点特定于平台,假设我在 Windows 上工作,是否可以从窗口获取 OpenGL 或 DirectDraw 表面,以便 GPU 可以直接绘制到它?
I have to convert several full PAL videos (720x576@25) from YUV 4:2:2 to RGB, in real time, and probably a custom resize for each.
I have thought of using the GPU, as I have seen some example that does just this (except that it's 4:4:4 so the bpp is the same in source and destiny)-- http://www.fourcc.org/source/YUV420P-OpenGL-GLSLang.c
However, I don't have any experience with using GPU's and I'm not sure of what can be done. The example, as I understand it, just converts the video frame to YUV and displays it in the screen.
Is it possible to get the processed frame instead? Would it be worth the effort to send it to the GPU, get it transformed, and sending it again to main memory, or would it kill performance?
Being a bit platform-specific, assuming I work on windows, is it possible to get an OpenGL or DirectDraw surface from a window so the GPU can draw directly to it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
真正的问题是,你希望从中得到什么?
在您接收视频的帧速率下,您可以使用英特尔性能基元之类的工具来执行您需要的几个操作并轻松跟上流。
如果您想学习如何进行 GPU 编程,这是您可以实现的一个很好的简单问题。
可以通过从 GPU 读回内存来获取处理后的帧。实际机制将根据您使用的 API(OpenGL、DirectX、CUDA、OpenCL)而有所不同。我已经用更高分辨率的视频做到了这一点,并且仍然保持 25fps 的流。但是,这一切都取决于您将使用的硬件。
DirectX 和 OpenGL 都有关于使用 Windows 表面作为渲染目标的精彩教程。
The real question is, what do you hope to get out of this?
At the frame rate you are receiving video, you could use something like Intel Performance Primitives to do the couple of operations that you need and easily keep up with the stream.
If you want to learn how to do gpu programming, this is a nice easy problem that you could implement.
It is possible to get the processed frame by doing a readback from the gpu to memory. The actual mechanic will vary depending on what api you use (OpenGL, DirectX, CUDA, OpenCL). I've done it with much greater resolution video and still kept up with a 25fps stream. However, this all depends on the hardware that you will be using.
DirectX and OpenGL both have great tutorials on using windows surfaces as render targets.
我实际上已经用 C 语言为 CUDA 编写了这个程序,并用 C 语言编写了一个 pthreads 程序。(不过,请注意,只是为了好玩。)我发现 GPU 运行速度如此之快,以至于您需要花费 50-80% 的时间来发送数据如此反复,即使每次都完全填满 GPU 内存。因此,CPU 完成这项工作的速度几乎与 GPU 一样快。正如您可能已经发现的那样,这个问题对线程非常友好,因此对于现代硬件,内存带宽是最大的问题。
我使用 Core i7 作为 CPU、GeForce 8800GT/GTX 285 作为显卡进行了测试。 GTX285 可以处理 1500fps 的 1920x1080 视频,因此无论您选择什么,速度都会快得惊人。
I have actually programmed this for CUDA in C, and a pthreads one in C. (just for fun, though, mind you.) And I found that the GPU works so fast that you spend 50-80% of your time sending data back and forth, even if you completely fill up the memory of the GPU every time. Due to this, the CPU did this work pretty much just as fast as the GPU could. This problem is extremely thread friendly as you may have figured out, so with modern hardware, memory bandwidth is the greatest issue.
I tested this with Core i7 as CPU, and GeForce 8800GT/GTX 285 as graphics card. The GTX285 processed afaik 1500fps of 1920x1080 video, so no matter what you choose, things will be blazingly fast.