cuda-内核优化
我创建了一个简单的粒子系统。我有一台计算能力为 2.1 的设备。我可以改变什么来优化内核?
我假设变量 tPos 和 tVel 存储在寄存器中。
__global__ void particles_kernel(float4 *vbo, float4 *pos, float4 *vel)
{
int tid = blockIdx.x * blockDim.x + threadIdx.x;
float4 tPos = pos[tid];
float4 tVel = vel[tid];
tPos.x += tVel.x;
tPos.y += tVel.y;
tPos.z += tVel.z;
if(tPos.x < -2.0f)
{
tVel.x = -tVel.x;
}
else if(tPos.x > 2.0f)
{
tVel.x = -tVel.x;
}
if(tPos.y < -2.0f)
{
tVel.y = -tVel.y;
}
else if(tPos.y > 2.0f)
{
tVel.y = -tVel.y;
}
if(tPos.z < -2.0f)
{
tVel.z = -tVel.z;
}
else if(tPos.z > 2.0f)
{
tVel.z = -tVel.z;
}
pos[tid] = tPos;
vel[tid] = tVel;
vbo[tid] = make_float4(tPos.x, tPos.y, tPos.z, tPos.w);
}
I created a simple particle system. I have a device with compute capability 2.1. What could I change to optimize the kernel?
I assume that variables tPos
and tVel
are stored in the registers.
__global__ void particles_kernel(float4 *vbo, float4 *pos, float4 *vel)
{
int tid = blockIdx.x * blockDim.x + threadIdx.x;
float4 tPos = pos[tid];
float4 tVel = vel[tid];
tPos.x += tVel.x;
tPos.y += tVel.y;
tPos.z += tVel.z;
if(tPos.x < -2.0f)
{
tVel.x = -tVel.x;
}
else if(tPos.x > 2.0f)
{
tVel.x = -tVel.x;
}
if(tPos.y < -2.0f)
{
tVel.y = -tVel.y;
}
else if(tPos.y > 2.0f)
{
tVel.y = -tVel.y;
}
if(tPos.z < -2.0f)
{
tVel.z = -tVel.z;
}
else if(tPos.z > 2.0f)
{
tVel.z = -tVel.z;
}
pos[tid] = tPos;
vel[tid] = tVel;
vbo[tid] = make_float4(tPos.x, tPos.y, tPos.z, tPos.w);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
除非我遗漏了一些东西,否则您的钳位代码可以像这样简化:
但是,考虑到计算量相对较小,这种更改可能不会提高性能,因为代码似乎受内存限制(您正在流式传输数据)。也许您的应用程序中的其他地方有额外的计算,您可以将其与此计算结合起来以增加计算密度?
Unless I am missing something, your clamping code can be simplified like this:
However given the relatively small amont of computation, this change will probably not improve performance as the code appears to be memory bound (you are streaming through the data). Maybe there is additional computation elsewhere in your app that you could combine with this computation to increase the computational density?