C# 中的计算开销 - 使用 getter/setter 与直接修改数组和转换速度
我本打算写一篇长篇大论的文章,但我将其总结如下:
我正在尝试通过 XNA 模拟 NES 的图形老式风格。然而,我的 FPS 很慢,试图修改每帧 65K 像素。如果我只是循环遍历所有 65K 像素并将它们设置为任意颜色,我会得到 64FPS。我编写的用于查找应将什么颜色放置在何处的代码,我得到了 1FPS。
我认为这是因为我的面向对象的代码。
现在,我将事物分为大约六个类,并带有 getter/setter。我猜想每帧至少调用 360K getter,我认为这是很大的开销。每个类都包含 1D 或 2D 数组,其中包含自定义枚举、int、Color 或 Vector2D、字节。
如果我将所有类合并为一个类,并直接访问每个数组的内容会怎么样?代码看起来会很乱,并且抛弃了面向对象编码的概念,但速度可能会快得多。
我也不担心访问冲突,因为任何获取/设置数组中数据的尝试都将在块中完成。例如,对数组的所有写入都将在从数组访问任何数据之前进行。
至于转换,我声明我正在使用自定义枚举、int、Color 和 Vector2D、字节。哪些数据类型在 .net Framework、XNA、XBox、C# 中使用和访问速度最快?我认为不断的铸造可能是这里速度减慢的一个原因。
此外,我没有使用数学来确定数据应该放入哪些索引,而是使用了预先计算的查找表,因此我不必在每帧中使用常数乘法、加法、减法、除法。 :)
I was going to write a long-winded post, but I'll boil it down here:
I'm trying to emulate the graphical old-school style of the NES via XNA. However, my FPS is SLOW, trying to modify 65K pixels per frame. If I just loop through all 65K pixels and set them to some arbitrary color, I get 64FPS. The code I made to look-up what colors should be placed where, I get 1FPS.
I think it is because of my object-orented code.
Right now, I have things divided into about six classes, with getters/setters. I'm guessing that I'm at least calling 360K getters per frame, which I think is a lot of overhead. Each class contains either/and-or 1D or 2D arrays containing custom enumerations, int, Color, or Vector2D, bytes.
What if I combined all of the classes into just one, and accessed the contents of each array directly? The code would look a mess, and ditch the concepts of object-oriented coding, but the speed might be much faster.
I'm also not concerned about access violations, as any attempts to get/set the data in the arrays will done in blocks. E.g., all writing to arrays will take place before any data is accessed from them.
As for casting, I stated that I'm using custom enumerations, int, Color, and Vector2D, bytes. Which data types are fastest to use and access in the .net Framework, XNA, XBox, C#? I think that constant casting might be a cause of slowdown here.
Also, instead of using math to figure out which indexes data should be placed in, I've used precomputed lookup tables so I don't have to use constant multiplication, addition, subtraction, division per frame. :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您是 XNA 开发人员,GDC 2008 上有一个精彩的演示值得一读。它称为理解XNA 框架性能。
对于您当前的架构 - 您还没有真正充分地描述它以给出明确的答案 - 您可能在紧密的循环中做了太多不必要的“东西”。如果我不得不猜测,我建议您当前的方法正在破坏缓存 - 您需要修复数据布局。
在理想情况下,您应该有一个尽可能小的值类型(结构而不是类)的大数组,以及一个将数据线性推入其中的大量内联循环。
(旁白:关于什么是快:整数和浮点数学非常快 - 一般来说,您不应该使用查找表。函数调用非常快 - 以至于当您复制大型结构时通过它们将更重要。JIT 将内联简单的 getter 和 setter - 尽管您不应该依赖它在非常紧密的循环中内联任何其他内容 - 比如您的 bitter。)
但是 - 即使优化了- 你当前的架构很糟糕。你所做的事情与现代 GPU 的工作方式背道而驰。您应该将精灵加载到 GPU 上并让它合成您的场景。
如果您想在像素级别操作精灵(例如:如您所提到的托盘交换),那么您应该使用像素着色器。 360(和 PC)上的 CPU 很快,但是当你做这样的事情时,GPU 的速度要快得多!
Sprite Effects XNA 示例是一个很好的起点。
There's a terrific presentation from GDC 2008 that is worth reading if you are an XNA developer. It's called Understanding XNA Framework Performance.
For your current architecture - you haven't really described it well enough to give a definite answer - you probably are doing too much unnecessary "stuff" in a tight loop. If I had to guess, I'd suggest that your current method is thrashing the cache - you need to fix your data layout.
In the ideal case you should have a nice big array of small-as-possible value types (structs not classes), and a heavily inlined loop that shoves data into it linearly.
(Aside: regarding what is fast: Integer and floating point maths is very fast - in general, you shouldn't use lookup tables. Function calls are pretty fast - to the point that copying large structs when you pass them will be more significant. The JIT will inline simple getters and setters - although you shouldn't depend on it to inline anything else in very tight loops - like your blitter.)
HOWEVER - even if optimised - your current architecture sucks. What you are doing flies in the face of how a modern GPU works. You should be loading your sprites onto your GPU and letting it composite your scene.
If you want to manipulate your sprites at a pixel level (for example: pallet swapping as you have mentioned) then you should be using pixel shaders. The CPU on the 360 (and on PCs) is fast, but the GPU is so much faster when you're doing something like this!
The Sprite Effects XNA sample is a good place to get started.
您是否对代码进行了分析以确定速度变慢的位置?在重写应用程序之前,您至少应该知道哪些部分需要重写。
我强烈怀疑访问器和数据转换的开销是微不足道的。更有可能的是,您的算法正在执行不必要的工作,重新计算它们可以缓存的值,以及可以在不破坏对象设计的情况下解决的其他问题。
Have you profiled your code to determine where the slowdown is? Before you go rewriting your application, you ought to at least know which parts need to be rewritten.
I strongly suspect that the overhead of the accessors and data conversions is trivial. It's much more likely that your algorithms are doing unnecessary work, recomputing values that they could cache, and other things that can be addressed without blowing up your object design.
您是否为每个像素指定颜色等?如果是这样的话,我认为你真的应该更多地考虑一下架构。开始使用精灵来加快速度。
编辑
好吧,我认为您的解决方案可以加载多个具有不同颜色的精灵(几个像素的精灵)并重用它们。指向相同的精灵比为每个像素分配不同的颜色更快,因为精灵已经加载到内存中
Are you specifying a color and such for each pixel or something? If that is the case I think you should really think about the architecture some more. Start using sprites that will speed things up.
EDIT
Okay I think what your solution could be load several sprites with different colours (a sprite of a few pixels) and reuse those. It is faster to point to the same sprite than to assign a different colour to each pixel as the sprite has already been loaded into memory
与任何性能问题一样,您应该分析应用程序以识别瓶颈,而不是试图猜测。我严重怀疑 getter 和 setter 是问题的根源。编译器几乎总是内联这些类型的函数。我也很好奇你对数学有什么看法。例如,将两个整数相乘是计算机可以做的最快的事情之一。
As with any performance problem, you should profile the application to identify the bottlenecks rather than trying to guess. I seriously doubt that getters and setters are at the root of your problem. The compiler almost always inlines these sorts of functions. I'm also curious what you have against math. Multiplying two integers, for instance, is one of the fastest things the computer can do.