使用 Opengl FBO 时性能损失严重

发布于 2024-11-05 13:59:43 字数 1401 浏览 7 评论 0原文

我已经使用 lwjgl (opengl) 成功实现了一个简单的二维游戏,其中对象随着远离玩家而逐渐消失。这种淡入淡出最初是通过计算玩家到每个对象原点的距离并使用它来缩放对象的 Alpha/不透明度来实现的。

然而,当使用较大的对象时,这种方法显得有点过于粗糙。我的解决方案是为对象中的每个像素实现 alpha/不透明度缩放。这不仅看起来更好,而且还将计算时间从 CPU 转移到 GPU。

我想我可以使用 FBO 和临时纹理来实现它。
通过绘制到 FBO 并使用特殊的混合模式使用预先计算的距离图(纹理)对其进行遮罩,我打算实现这种效果。 算法如下:

0)初始化opengl并设置FBO
1) 将背景渲染到标准缓冲区
2) 切换到自定义FBO并清除
3)渲染对象(到FBO)
4)使用距离纹理遮罩FBO
5) 切换到标准缓冲区
6) 渲染FBO临时纹理(到标准缓冲区)
7) 渲染 hud 元素

一些额外的信息:

  • 临时纹理的大小与窗口相同(因此是标准缓冲区)
  • 第 4 步使用特殊的混合模式来实现所需的效果:
    GL11.glBlendFunc( GL11.GL_ZERO, GL11.GL_SRC_ALPHA );
  • 我的临时纹理是使用 min/mag 过滤器创建的: GL11.GL_NEAREST
  • 数据是使用以下方式分配的: org.lwjgl.BufferUtils.createByteBuffer(4 * width * height);
  • 纹理使用以下方式初始化: GL11.glTexImage2D( GL11.GL_TEXTURE_2D, 0, GL11.GL_RGBA, 宽度, 高度, 0, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, dataBuffer);
  • 我的代码中没有 GL 错误。

这确实达到了预期的效果。 然而,当我做了一些性能测试时,我发现我的 FBO 方法会削弱性能。我通过请求 1000 次连续渲染并测量时间进行测试。结果如下:

在 512x512 分辨率下:

  • 正常:~1.7s
  • FBO:~2.5s
  • (FBO -step 6:~1.7s)
  • (FBO -step 4:~1.7s)

在 1680x1050 分辨率下:

  • 正常:~1.7s
  • FBO : ~7s
  • (FBO -步骤 6: ~3.5s)
  • (FBO -步骤 4: ~6.0s)

你可以看,这规模真的很糟糕。更糟糕的是,我打算进行第二次此类操作。我测试的机器对于我的目标受众来说应该是高端的,所以我可以预期人们使用这种方法时的帧速率远低于 60 fps,这对于这么简单的游戏来说是很难接受的。

我能做些什么来挽救我的表现?

I have successfully implemented a simple 2-d game using lwjgl (opengl) where objects fade away as they get further away from the player. This fading was initially implemented by computing distance to origin of each object from the player and using this to scale the objects alpha/opacity.

However when using larger objects, this approach appears a bit too rough. My solution was to implement alpha/opacity scaling for every pixel in the object. Not only would this look better, but it would also move computation time from CPU to GPU.

I figured I could implement it using an FBO and a temporary texture.
By drawing to the FBO and masking it with a precomputed distance map (a texture) using a special blend mode, I intended to achieve the effect.
The algorithm is like so:

0) Initialize opengl and setup FBO
1) Render background to standard buffer
2) Switch to custom FBO and clear it
3) Render objects (to FBO)
4) Mask FBO using distance-texture
5) Switch to standard buffer
6) Render FBO temporary texture (to standard buffer)
7) Render hud elements

A bit of extra info:

  • The temporary texture has the same size as the window (and thus standard buffer)
  • Step 4 uses a special blend mode to achieve the desired effect:
    GL11.glBlendFunc( GL11.GL_ZERO, GL11.GL_SRC_ALPHA );
  • My temporary texture is created with min/mag filters: GL11.GL_NEAREST
  • The data is allocated using: org.lwjgl.BufferUtils.createByteBuffer(4 * width * height);
  • The texture is initialized using:
    GL11.glTexImage2D( GL11.GL_TEXTURE_2D, 0, GL11.GL_RGBA, width, height, 0, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, dataBuffer);
  • There are no GL errors in my code.

This does indeed achieve the desired results.
However when I did a bit of performance testing I found that my FBO approach cripples performance. I tested by requesting 1000 successive renders and measuring the time. The results were as following:

In 512x512 resolution:

  • Normal: ~1.7s
  • FBO: ~2.5s
  • (FBO -step 6: ~1.7s)
  • (FBO -step 4: ~1.7s)

In 1680x1050 resolution:

  • Normal: ~1.7s
  • FBO: ~7s
  • (FBO -step 6: ~3.5s)
  • (FBO -step 4: ~6.0s)

As you can see, this scales really badly. To make it even worse, I'm intending to do a second pass of this type. The machine I tested on is supposed to be high end in terms of my target audience, so I can expect people to have far below 60 fps with this approach, which is hardly acceptable for a game this simple.

What can I do to salvage my performance?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦毁影碎の 2024-11-12 13:59:43

根据 Damon 和 sidewinderguy 的建议,我使用片段着色器(和顶点着色器)成功实现了类似的解决方案。我的性能比我最初的 cpu 运行的基于对象的计算好一点,这比我的 FBO 方法快得多。同时,它提供的视觉结果更接近 FBO 方法(重叠对象的行为有点不同)。

对于任何感兴趣的人来说,片段着色器基本上都会转换 gl_FragCoord.xy 并进行纹理查找。我不确定这是否能提供最佳性能,但在仅激活 1 个其他纹理的情况下,我预计通过省略查找并直接计算纹理值不会提高性能。另外,我现在不再有性能瓶颈,所以进一步的优化应该等到发现需要时再进行。

另外,我非常感谢我收到的所有帮助、建议和评论:-)

As suggested by Damon and sidewinderguy I successfully implemented a similar solution using a fragment shader (and vertex shader). My performance is little bit better than my initial cpu-run object-based computation, which is MUCH faster than my FBO-approach. At the same time it provides visual results much closer to the FBO-approach (Overlapping objects behave a bit different).

For anyone interested the fragment shader basically transforms the gl_FragCoord.xy and does a texture lookup. I am not sure this gives the best performance, but with only 1 other texture activated I do not expect performance to increase by omitting the lookup and computing the texture value directly. Also, I now no longer have a performance bottleneck, so further optimizations should wait till it is found to be required.

Also, I am very grateful for the all the help, suggestions and comments I received :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文