我已经根据 “构建高级粒子系统”(John van der Burg,游戏开发者杂志,2000 年 3 月)。
现在我想知道我应该从这个系统中获得什么性能。我目前正在我的简单(未完成)SDL/OpenGL 平台游戏的上下文中对其进行测试,其中所有粒子每帧都会更新。绘图的完成方式如下
// Bind Texture
glBindTexture(GL_TEXTURE_2D, *texture);
// for all particles
glBegin(GL_QUADS);
glTexCoord2d(0,0); glVertex2f(x,y);
glTexCoord2d(1,0); glVertex2f(x+w,y);
glTexCoord2d(1,1); glVertex2f(x+w,y+h);
glTexCoord2d(0,1); glVertex2f(x,y+h);
glEnd();
,其中一种纹理用于所有粒子。
最多可顺利运行约 3000 个粒子。老实说,我的期望更高,特别是因为它需要在屏幕上与多个系统一起使用。我应该期望多少个粒子才能顺利显示?
PS:我对 C++ 和 OpenGL 同样比较陌生,所以很可能我在某个地方搞砸了!?
编辑使用POINT_SPRITE
glEnable(GL_POINT_SPRITE);
glBindTexture(GL_TEXTURE_2D, *texture);
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE);
// for all particles
glBegin(GL_POINTS);
glPointSize(size);
glVertex2f(x,y);
glEnd();
glDisable( GL_POINT_SPRITE );
根本看不出使用GL_QUADS
有任何性能差异!?
编辑 使用VERTEX_ARRAY
// Setup
glEnable (GL_POINT_SPRITE);
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE);
glPointSize(20);
// A big array to hold all the points
const int NumPoints = 2000;
Vector2 ArrayOfPoints[NumPoints];
for (int i = 0; i < NumPoints; i++) {
ArrayOfPoints[i].x = 350 + rand()%201;
ArrayOfPoints[i].y = 350 + rand()%201;
}
// Rendering
glEnableClientState(GL_VERTEX_ARRAY); // Enable vertex arrays
glVertexPointer(2, GL_FLOAT, 0, ArrayOfPoints); // Specify data
glDrawArrays(GL_POINTS, 0, NumPoints); // ddraw with points, starting from the 0'th point in my array and draw exactly NumPoints
使用 VA 会对上述性能产生影响。然后我尝试了 VBO,但并没有真正看到性能差异?
I have implemented a 2D Particle System based on the ideas and concepts outlined in "Bulding an Advanced Particle System" (John van der Burg, Game Developer Magazine, March 2000).
Now I am wondering what performance I should expect from this system. I am currently testing it within the context of my simple (unfinished) SDL/OpenGL platformer, where all particles are updated every frame. Drawing is done as follows
// Bind Texture
glBindTexture(GL_TEXTURE_2D, *texture);
// for all particles
glBegin(GL_QUADS);
glTexCoord2d(0,0); glVertex2f(x,y);
glTexCoord2d(1,0); glVertex2f(x+w,y);
glTexCoord2d(1,1); glVertex2f(x+w,y+h);
glTexCoord2d(0,1); glVertex2f(x,y+h);
glEnd();
where one texture is used for all particles.
It runs smoothly up to about 3000 particles. To be honest I was expecting a lot more, particularly since this is meant to be used with more than one system on screen. What number of particles should I expect to be displayed smoothly?
PS: I am relatively new to C++ and OpenGL likewise, so it might well be that I messed up somewhere!?
EDIT Using POINT_SPRITE
glEnable(GL_POINT_SPRITE);
glBindTexture(GL_TEXTURE_2D, *texture);
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE);
// for all particles
glBegin(GL_POINTS);
glPointSize(size);
glVertex2f(x,y);
glEnd();
glDisable( GL_POINT_SPRITE );
Can't see any performance difference to using GL_QUADS
at all!?
EDIT Using VERTEX_ARRAY
// Setup
glEnable (GL_POINT_SPRITE);
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE);
glPointSize(20);
// A big array to hold all the points
const int NumPoints = 2000;
Vector2 ArrayOfPoints[NumPoints];
for (int i = 0; i < NumPoints; i++) {
ArrayOfPoints[i].x = 350 + rand()%201;
ArrayOfPoints[i].y = 350 + rand()%201;
}
// Rendering
glEnableClientState(GL_VERTEX_ARRAY); // Enable vertex arrays
glVertexPointer(2, GL_FLOAT, 0, ArrayOfPoints); // Specify data
glDrawArrays(GL_POINTS, 0, NumPoints); // ddraw with points, starting from the 0'th point in my array and draw exactly NumPoints
Using VAs made a performance difference to the above. I've then tried VBOs, but don't really see a performance difference there?
发布评论
评论(2)
考虑使用顶点数组而不是立即模式(glBegin/End): http://www.songho.ca /opengl/gl_vertexarray.html
如果您愿意了解着色器,您还可以搜索“顶点着色器”并考虑在您的项目中使用该方法。
Consider using vertex arrays instead of immediate mode (glBegin/End): http://www.songho.ca/opengl/gl_vertexarray.html
If you are willing to get into shaders, you could also search for "vertex shader" and consider using that approach for your project.
我不能说您对该解决方案的期望有多大,但有一些方法可以改进它。
首先,通过使用 glBegin() 和 glEnd() 您正在使用立即模式,据我所知,这是最慢的执行方式。此外,它甚至不再出现在当前的 OpenGL 标准中。
对于 OpenGL 2.1
点精灵:
您可能想要使用点精灵。我使用它们实现了一个粒子系统,并获得了很好的性能(至少就我当时的知识而言)。使用点精灵,每帧执行的 OpenGL 调用更少,向显卡发送的数据也更少(或者甚至将数据存储在显卡上,对此不确定)。简短的谷歌搜索甚至应该为您提供一些可供查看的实现。
顶点数组:
如果使用点精灵没有帮助,您应该考虑将顶点数组与点精灵结合使用(以节省一点内存)。基本上,您必须将粒子的顶点数据存储在数组中。然后,通过使用 GL_VERTEX_ARRAY 作为参数调用 glEnableClientState() 来启用顶点数组支持。之后,调用glVertexPointer()(参数在OpenGL文档中进行了解释)并调用glDrawArrays()来绘制粒子。这会将您的 OpenGL 调用减少到只有少数,而不是每帧 3000 次调用。
对于 OpenGL 3.3 及更高版本
实例化:
如果您针对 OpenGL 3.3 或更高版本进行编程,您甚至可以考虑使用实例化来绘制粒子,这应该会进一步加快速度。同样,简短的谷歌搜索将让您查看一些相关代码。
一般情况:
使用 SSE:
此外,更新顶点位置时可能会损失一些时间。因此,如果您想加快速度,可以考虑使用 SSE 来更新它们。如果做得正确,您将获得大量性能(至少在大量粒子下)
数据布局:
最后,我最近找到了一个链接(divergentcoder.com/programming/aos-soa-explorations-part-1,谢谢 Ben)关于数组结构 (SoA)和结构数组 (AoS)。他们通过粒子系统的例子来比较它们如何影响性能。
I can't say how much you can expect from that solution, but there are some ways to improve it.
Firstly, by using glBegin() and glEnd() you are using immediate mode, which is, as far as I know, the slowest way of doing things. Furthermore, it isn't even present in the current OpenGL standard anymore.
For OpenGL 2.1
Point Sprites:
You might want to use point sprites. I implemented a particle system using them and came up with a nice performance (for my knowledge back then, at least). Using point sprites you are doing less OpenGL calls per frame and you send less data to the graphic card (or even have the data stored at the graphic card, not sure about that). A short google search should even give you some implementations of that to look at.
Vertex Arrays:
If using point sprites doesn't help, you should consider using vertex arrays in combination with point sprites (to save a bit of memory). Basically, you have to store the vertex data of the particles in an array. You then enable vertex array support by calling glEnableClientState() with GL_VERTEX_ARRAY as parameter. After that, you call glVertexPointer() (the parameters are explained in the OpenGL documentation) and call glDrawArrays() to draw the particles. This will reduce your OpenGL calls to only a handfull instead of 3000 calls per frame.
For OpenGL 3.3 and above
Instancing:
If you are programming against OpenGL 3.3 or above, you can even consider using instancing to draw your particles, which should speed that up even further. Again, a short google search will let you look at some code about that.
In General:
Using SSE:
In addition, some time might be lost while updating your vertex positions. So, if you want to speed that up, you can take a look at using SSE for updating them. If done correctly, you will gain a lot of performance (at a large amount of particles at least)
Data Layout:
Finally, I recently found a link (divergentcoder.com/programming/aos-soa-explorations-part-1, thanks Ben) about structures of arrays (SoA) and arrays of structures (AoS). They were compared on how they affect the performance with an example of a particle system.