调度超过 65535 个线程
我正在尝试使用 DirectCompute 对顶点进行蒙皮。所采用的蒙皮方法使得您可以拥有影响每个顶点的可变数量的权重(例如,Md5 网格就是以这种方式定义的)。
基本上计算着色器的输入是。
JointsBuffer { float4 orientation, float4 position } Structured buffer SRV
WeightsBuffer { float3 normal, float4 position, float bias, uint jointIndex } Structured buffer SRV
VerticesBuffer { float2 texcoords, uint weightIndex, uint numWeights } Structured buffer SRV
现在,
SkinnedVerticesBuffer { float3 normal, float4 position, float2 texcoord } Structured buffer UAV
计算着色器应该为顶点缓冲区中的每个元素运行一次,并使用 SV_DispatchThreadID,着色器尝试为 VerticesBuffer 中的每个顶点填充 SkinnedVerticesBuffer 中相应的 SkinnedVertex(1:1 对应关系)。
因此,问题在于许多网格体的顶点数超过 65535 个,而 DispatchThreadID 命令只允许为每个维度分派那么多线程。现在我理论上可以写一些东西,将很多数字分成小于 65535 的三个因数的组合,但对于质数我不可能做到这一点。
因此,例如,当出现一些具有 71993 个(质数)顶点的网格时,我想不出处理它的方法。
我不能过度调度 72000 个带有 context->Dispatch( 36000, 2, 0 ) 的线程,因为这样 DispatchThreadID 将超出我的缓冲区范围。
现在我倾向于使用一个恒定的缓冲区来保存顶点的数量,然后过度调度到最接近的2的幂,然后简单地做
if( SV_DispatchThreadID > numVertices ) return;
这是我唯一的选择吗?其他人都会遇到这个障碍。
I'm attempting to skin vertices using DirectCompute. The method of skinning employed is such that you can have a variable amount of weights influencing each vertex (e.g. Md5 meshes are defined this way).
Basically inputs to the compute shader are.
JointsBuffer { float4 orientation, float4 position } Structured buffer SRV
WeightsBuffer { float3 normal, float4 position, float bias, uint jointIndex } Structured buffer SRV
VerticesBuffer { float2 texcoords, uint weightIndex, uint numWeights } Structured buffer SRV
and the output is
SkinnedVerticesBuffer { float3 normal, float4 position, float2 texcoord } Structured buffer UAV
Now the compute shader should be run once per element in the vertex buffer, and using SV_DispatchThreadID the shader attempts to populate the corresponding SkinnedVertex in the SkinnedVerticesBuffer for every Vertex in the VerticesBuffer ( 1:1 correspondence ).
So the problem is that many meshes have greater than 65535 vertices, and the DispatchThreadID command only allows for dispatching that many threads per dimension. Now I can theoretically write something that divides a lot of numbers up into a combination of three factors less than 65535, but I can't possibly do that for prime numbers.
So for example when some mesh with 71993 ( a prime number ) of vertices comes up I can't think of a way to handle it.
I can't over dispatch say 72000 threads with context->Dispatch( 36000, 2, 0 ), because then DispatchThreadID will run out of my buffer bounds.
Right now I'm leaning towards a constant buffer holding the amount of vertices, and then over dispatching to the nearest power of 2 and then simply doing
if( SV_DispatchThreadID > numVertices ) return;
Is this my only option? Anyone else run into this snag.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我从来没有。但 65000 个线程似乎太多了。
然后,当我尝试查找 文档 看来你传递的值不是线程,而是线程组。 gamedev 上的某人在通过时似乎存在性能问题一个像 768 这样大的数字,所以在我看来,你必须减少这个巨大的数字。
我不确定,但我感觉你误解了这些参数。尝试再次阅读这些值的实际含义。 (不过,这只是外行人的直觉。)
I've never. But 65000 threads seems like an awful lot.
Then, when I try to find documentation it seems that the values you pass are not threads, but thread groups. Someone on gamedev seems to have performance issues when passing a number as great as 768, so it seems to me that you will have to decrease that huge number.
I'm not sure, but I got the feeling you're misinterpreting these parameters. Try to read again what these values actually mean. (Just a layman's gut feeling, though.)