使用 Direct3D 时,CPU 上执行了多少数学运算?
背景:我才刚刚开始。我什至没有接触 Direct3D 11 API,而是着眼于理解管道等。
通过查看网络上的文档和信息,似乎某些计算正在由应用程序处理。也就是说,计算不是简单地向 GPU 提供矩阵相乘,而是由在 CPU 上运行的数学库完成。我没有任何特定的资源可以指向,尽管我想我可以指向 XNA 数学库 或 2 月 DX SDK 中提供的示例。当您看到像 mViewProj = mView * mProj;
这样的代码时,该投影是在 CPU 上计算的。还是我错了?
如果你正在编写一个程序,屏幕上可以有 10 个立方体,你可以移动或旋转立方体以及视点,你会在 CPU 上进行什么计算?我想我会存储单个立方体的几何形状,然后转换代表实际实例的矩阵。然后我似乎会使用 XNA 数学库或我选择的另一个数学库来转换模型空间中的每个立方体。然后获取世界空间中的坐标。然后将信息推送到GPU。
CPU 上的计算量相当大。我错了吗?
- 我是否基于太少的信息和理解而得出结论?
- 如果答案是 STFW,我应该 Google 搜索哪些术语?
- 或者如果我是对的,为什么这些计算不被推送到 GPU 上呢?
编辑:顺便说一句,我没有使用 XNA,但文档指出 XNA 数学库取代了以前的 DX 数学库。 (我将 SDK 中的 XNA 库视为纯粹的模板库)。
Context: I'm just starting out. I'm not even touching the Direct3D 11 API, and instead looking at understanding the pipeline, etc.
From looking at documentation and information floating around the web, it seems like some calculations are being handled by the application. That, is, instead of simply presenting matrices to multiply to the GPU, the calculations are being done by a math library that operates on the CPU. I don't have any particular resources to point to, although I guess I can point to the XNA Math Library or the samples shipped in the February DX SDK. When you see code like mViewProj = mView * mProj;
, that projection is being calculated on the CPU. Or am I wrong?
If you were writing a program, where you can have 10 cubes on the screen, where you can move or rotate cubes, as well as viewpoint, what calculations would you do on the CPU? I think I would store the geometry for the a single cube, and then transform matrices representing the actual instances. And then it seems I would use the XNA math library, or another of my choosing, to transform each cube in model space. Then get the coordinates in world space. Then push the information to the GPU.
That's quite a bit of calculation on the CPU. Am I wrong?
- Am I reaching conclusions based on too little information and understanding?
- What terms should I Google for, if the answer is STFW?
- Or if I am right, why aren't these calculations being pushed to the GPU as well?
EDIT: By the way, I am not using XNA, but documentation notes the XNA Math Library replaces the previous DX Math library. (i see the XNA Library in the SDK as a sheer template library).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
“我是否基于太少的信息和理解而得出结论?”
这并不像我们都做的那样是一件坏事,但总而言之:是的。
GPU 正在执行的操作通常取决于 GPU 驱动程序和您的访问方法。大多数时候你真的不关心或不需要知道(除了好奇心和一般理解)。
对于 mViewProj = mView * mProj;这很可能发生在 CPU 上。但负担并不大(最多以100个周期计算)。真正的技巧是新的视图矩阵在“世界”上的应用。每个顶点或多或少都需要进行变换,以及着色、纹理、光照等。如果这项工作将在 GPU 中完成,那么所有这些工作(如果在 CPU 上完成,速度会非常快)。
通常,您会对世界进行高级别的更改,可能需要 20 个 CPU 密集型计算,而 GPU 负责根据更改渲染世界所需的数百万或数十亿次计算。
在您的 10 个立方体示例中:您为每个立方体提供一个转换,创建转换所需的任何数学都是 CPU 限制的(有例外)。您还可以为视图提供变换,同样,创建变换矩阵可能会受到 CPU 限制。一旦你有了 11 个新矩阵,你就可以将它们应用到世界上。从硬件的角度来看,需要将 11 个矩阵复制到 GPU...这会发生得非常非常快...一旦复制完成,CPU 就会完成,GPU 将根据新数据重新计算世界,将其渲染为一个缓冲区并将其拉到屏幕上。因此,对于 10 个立方体,CPU 密集型计算是微不足道的。
查看 XNA 项目的一些反映代码,您将看到计算在哪里结束,XNA 开始在哪里(XNA 会做 GPU 中可能做的一切事情)。
"Am I reaching conclusions based on too little information and understanding?"
Not as a bad thing, as we all do it, but in a word: Yes.
What is being done by the GPU is, generally, dependent on the GPU driver and your method of access. Most of the time you really don't care or need to know (other than curiosity and general understanding).
For mViewProj = mView * mProj; this is most likely happening on the CPU. But it is not much burden (counted in 100's of cycles at the most). The real trick is the application of the new view matrix on the "world". Every vertex needs to be transformed, more or less, along with shading, textures, lighting, etc. All if this work will be done in the GPU (if done on the CPU things will slow down really fast).
Generally you make high level changes to the world, maybe 20 CPU bound calculations, and the GPU takes care of the millions or billions of calculations needed to render the world based on the changes.
In your 10 cube example: You supply a transform for each cube, any math needed for you to create the transform is CPU bound (with exceptions). You also supply a transform for the view, again creating the transform matrix might be CPU bound. Once you have your 11 new matrices you apply the to the world. From a hardware point of view the 11 matrices need to be copied to the GPU...that will happen very, very fast...once copied the CPU is done and the GPU recalculates the world based on the new data, renders it to a buffer and poops it on the screen. So for your 10 cubes the CPU bound calculations are trivial.
Look at some reflected code for an XNA project and you will see where your calculations end and XNA begins (XNA will do everything is possibly can in the GPU).