DirectX9、DirectDraw、优化?
首先,我正在编写一个游戏。目前,在渲染函数中,有对两个不同函数的两次调用。一种渲染一些文本,一种渲染精灵。
在我的计算机(AMD Phenom(tm) II X4 955 处理器(4 个 CPU)、~3.2GHz、4096MB RAM DDR2、NVIDIA GeForce GTX 285)上,渲染大约 200 个精灵时,渲染速度约为 2200 FPS,渲染大约 100 FPS渲染量约为14,500。
我使用一个向量来存储我正在渲染的每个对象的信息,并使用一个精灵进行多次绘制调用。
VS2008 发布模式,针对 C++ 进行全面优化。我知道我听说左和右不会过早优化,但在这一点上,它对我来说运行得很好,但在某些计算机上运行得不太好。
我无法想象将向量更改为数组,因为我每帧都以一种不确定的方法从向量中推入和拉出东西。几乎是随机的。
我尝试过浮动和双打,速度没有什么不同。
使用 DirectDraw 而不是 DirectX 和 Sprite Render 方法会有不同吗?由于我不知道 DirectDraw 和 DirectX 之间的差异,因此我并不是 100% 我应该考虑这一点。
该游戏在普通计算机上运行良好,但我将我的游戏与东方进行比较。 《东方》在我尝试过的最弱的计算机上以 60 FPS 运行,但我的游戏运行速度不会超过 36~42 FPS。我无法想象我做错了什么,因为我对 DirectX 和 C++ 还很陌生。
在这件事上的任何帮助都会很棒,不幸的是我暂时不会添加信息或回答问题。
First off, I'm programming a game. Currently in the render function there are two calls to two different functions. One renders some text, one renders sprites.
On my computer (AMD Phenom(tm) II X4 955 Processor (4 CPUs), ~3.2GHz, 4096MB RAM DDR2, NVIDIA GeForce GTX 285) I have a render speed of ~2200 FPS when rendering around 200 sprites and about 100 FPS when rendering about 14,500.
I'm using a vector to store the information of each object I'm rendering and using one sprite with many draw calls.
VS2008 release mode with full optimization for C++. I know I've heard left and right don't optimize prematurely, but at this point, it's running great for me, but not so well on certain computers.
I can't imagine changing vectors out for arrays since I'm pushing and pulling things from the vector every frame, in an indeterminable method. Nearly randomly.
I've tried floats and doubles and the speed is no different.
Would it be different using DirectDraw rather than DirectX and the Sprite Render method? Since I've no idea the differences between DirectDraw and DirectX, I'm not 100% what I should be thinking about that.
The game runs fine on average computers, but what I'm comparing my game to is Touhou. Touhou runs at 60 FPS on the weakest computer I've tried, but my game won't run faster than 36~42 FPS. I can't imagine what I'm doing wrong, being so new to DirectX and C++.
Any assistance in this matter would be great, unfortunately I won't be around for awhile to add information or answers questions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您需要一个分析器。
回复中有一些很好的性能建议,但这并不重要。尝试在没有分析器的情况下优化程序就像尝试在没有编译器的情况下编写程序一样。不要猜测,测量。
话虽如此,分析图形代码是一个臭名昭著的难题,并且(据我所知)没有任何好的免费工具可以帮助解决这个问题。所以现在不用介意:从普通的 CPU 分析器开始,找出哪个调用真正占用了您的所有时间。
You need a profiler.
There's some good performance advice in the responses, but it doesn't matter. Trying to optimize a program without a profiler is like trying to write a program without a compiler. Do not guess, measure.
Now with that said, profiling graphics code is an infamous pain in the neck, and there aren't (to my knowledge) any good, free tools to help with it. So never mind that for now: start with an ordinary CPU profiler, and find out which of your calls is really taking up all your time.
我不确定我是否明白你在说什么,但这听起来像是你在很多不同的地方绘制了本质上相同的对象。如果是这种情况,您可能需要查找 DirectX 实例化。基本思想是指定 1) 要绘制的几何图形,以及 2) 绘制它的多个位置。这样可以节省每次绘制对象时重新指定几何图形的麻烦,因此可以显着提高速度。
I'm not sure I understand what you're saying, but this sounds like you're drawing essentially the same object in a lot of different places. If that's the case, you probably want to look up DirectX Instancing. The basic idea is you specify 1) the geometry to draw, and 2) a number of places to draw it. This saves re-specifying the geometry every time you draw the object, so it can improve speed considerably.
您是否在向量背面以外的位置插入和/或删除东西?在向量中,从中间插入和删除需要 O(n) 时间,也就是说,所花费的时间与向量的大小成正比。
如果是这种情况,请考虑使用
std::list
代替。请注意,对于 10k+ 对象,这很容易导致性能问题,具体取决于您执行此操作的频率。Are you inserting and/or removing things from positions other than the back of the vector? In a vector, insertions and removals from the middle take O(n) time, that is, the amount of time it takes is proportional to the size of your vector.
If that is the case, then consider using an
std::list
instead. Note that with 10k+ objects this could easily be causing your performance issues, depending on how often you do it.分析您的应用程序,并确定您的瓶颈是 CPU 还是 GPU(或两者之间的传输总线)
确定后,您有几个选择:
1) 如果是 CPU,您可以尝试实例化以减少绘制调用的次数。或者,如果您的目标计算机不支持硬件实例,请尝试某种批处理。要实例化或批处理精灵,您必须像默认界面一样使用 QUAD(2 个三角形方向)。
2) 如果是 GPU,请尝试了解是否是着色器导致速度减慢。如果是这种情况,请尝试优化它。如果不是着色器,请尝试减少过度绘制。如果使用从前到后绘图时部分对象不透明。
3) 如果是总线,请尝试像 CPU 一样,通过批处理可以减少传输数据所需的锁定/解锁数量。 (通过实例,您根本不需要更新缓冲区)
仅此而已。 :P
附言警告...请勿尝试使用 CPU 分析器分析 DirectX 调用。 (但使用 nVidia 的 PerfHud 或 ATI 的 GPUPerfStudio,或 Intel 的 GPA)
它只是浪费了时间,DirectX 有一个命令缓冲区,您不能保证现在发出的调用会在当时执行。大多数时候它会立即返回并且不执行任何操作。
Profile your application, and determine if your bottleneck is the CPU or the GPU (or the transfer BUS between the two)
When determined you have a few choices :
1) If its the CPU, you can try instancing to reduce the number of draws call. Or if your target machine does not support Hardware instancing, try a kind of batching. To instance or Batch a sprite you have to use a QUAD (2 triangle orientated) as the default interface does.
2) If its the GPU, try to understand if its a shader causing the slowdown. If that's the case try to optimize it. If its not the shader, try to reduce overdraw. IF part of your objects are not transparent using front-to-back drawing.
3) If its the BUS, try to do as with the CPU, as with batching you reduce the number of Locks/Unlocks you need to transfer the data. (with instancing you would not need to update the buffer at all)
That's all. :P
P.S. A warning...DO NOT TRY TO PROFILE DirectX calls with a CPU profiler. (but use PerfHud from nVidia or GPUPerfStudio from ATI, or GPA from Intel)
Its just time losed, DirectX has a command buffer and you are not assured that a call made now it is executed that time. Most of the time it returns immediately and do nothing.