世界背后的想法要查看投影?

发布于 2025-01-18 15:10:24 字数 426 浏览 0 评论 0原文

我对光栅化应该如何工作有相当模糊的理解。 所以我完全理解顶点是如何构成 3D 图像的。我还尝试了模型到世界投影,尽管我不理解其背后的数学原理(我使用辅助库来乘以矩阵,并有一个图表表示如何应用不同的变换:旋转、缩放、平移等)。 因此,我可以很容易地使用搅拌机构建一些 3D 模型,并应用该逻辑为每个对象构建一个世界矩阵。

但我在尝试想象相机矩阵应该如何“查看”特定的顶点簇时遇到了困难,以及在将相机矩阵应用于世界矩阵后,对象的世界坐标到底发生了什么?相机矩阵是什么样的,相机的“视图轴”如何影响它的矩阵(相机可以分别查看 z、x、y 轴)

我已经成功地使用各种渲染引擎渲染了几个 3D 对象(openGL、XNA 等),但大部分是由于遵循了互联网上的一些指南或试图解释 youtube 上的某些人试图教授的内容,而我仍在努力尝试获得“直观”的感觉矩阵是怎样的应该按照相机参数工作以及相机应该如何改变对象的世界矩阵

I have a rather vague understanding of how rasterization is supposed to work.
So I totally understand how vertices make up a 3d image. I also ventured into model to world projection and even though I don't understand the math behind it ( I use helper libraries to multiply the matrices and have a chart denoting how to apply different transformations: rotate, scale, translate, etc).
So it's very easy for me to build some 3d model using blender and apply that logic to build a world matrix for each object.

But i've hit a brick wall trying to envision how to camera matrix is supposed to "look at" a specific cluster of vertices and what exactly happens to the object's world coordinates after the camera matrix is applied to the world matrix? and what does a camera matrix look like and how does the camera's "view axis" affect it's matrix (the camera could be looking at the z,x, y axis respectively)

I've managed to render a couple 3d objects with various rendering engines (openGL, XNA, etc) but most of it was due to having followed some guide on the internet or trying to interpret what some guy on youtube is trying to teach, and i'm still struggling trying to get an "intuitive" sense on how matrices are supposed to work as per camera parameters and how the camera is supposed to alter the object's world matrix

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

深爱成瘾 2025-01-25 15:10:24

从“世界空间” (WX,WY,WZ)到“屏幕空间” (SX,SY):查看,剪辑,投影,透视图有5个步骤划分,视口。这很好地描述了在这里,但一些细节被掩盖了。我将尝试从概念上解释这些步骤。

想象一下,您有一些顶点(我们想要渲染的内容),一个相机(具有位置和方向 - 指向哪个方向)和屏幕(widthxheight的矩形网格像素)。

我认为您已经理解的模型矩阵:它缩放,旋转并将每个顶点转换为世界坐标(WX,WY,WY,WZ,1.0)。最后一个“ 1.0”(有时称为W组件)使我们可以将翻译和投影(以及缩放和旋转)表示为单个4x4矩阵。

视图矩阵(又称摄像机矩阵)将所有顶点移至相机的视角。我认为它是2个步骤工作的工作:首先,它将整个场景(包括相机在内的所有顶点)翻译成,以便在新的坐标系统中,相机处于原始状态。其次,它旋转整个场景,以使相机从 - Z轴方向的原点看。 There is a good description of this 在这里。 (从数学上讲,旋转首先发生,但是我发现如果首先进行翻译,我会更容易看到。可视化这一点的好方法是想象整个场景都嵌入了冰中。抓住冰块并将其移动,使相机位于沿-Z轴的原点(以及世界上所有其他物体随着冰的嵌入而移动)。

接下来,投影矩阵编码相机具有什么样的镜头(广角与远摄);换句话说,在屏幕上可以看到世界上多少。很好地描述了在这里,但这是计算的方式:

[    near/width   ][        0        ][         0              ][        0       ]
[        0        ][    near/height  ][         0              ][        0       ]
[        0        ][        0        ][(far+near)/(far-near)   ][        1       ]
[        0        ][        0        ][-(2*near*far)/(far-near)][        0       ]

near = near plane distance (everything closer to the camera than this is clipped).
far  = far plane distance (everything farther from the camera than this is clipped).
width  = the widest object we can see if it is at the near plane.
height = the tallest object we can see if it is at the near plane.

。它导致“剪辑坐标” (CX,CY,CZ,CW = VZ)。请注意,viewspace z坐标(vz)最终位于剪辑坐标的W坐标(cw)(详细介绍下面)。这个矩阵延伸了世界,以使相机的视野现在为45度,向下,向下,左和右。换句话说,在此坐标系统中{向下,左,右}您将看到屏幕的顶部{底部,左,右}的内容。您可以将其视为金字塔形状,其中相机位于金字塔的顶部,相机在金字塔内部直视。 (该形状被称为“ frustum”,一旦您用近距离和远面的平面剪切了金字塔的顶部和底部 - 请参阅下一段。) cz = -cw和远面的顶点具有cz = cw cw

剪辑发生在剪辑坐标中(哪个这就是为什么它们被称为)。剪裁意味着您将一些剪刀剪掉,然后剪掉任何金字塔形状外的任何东西。您还将所有离相机太近的东西(“靠近平面”)以及离相机太远的所有东西(“远面飞机”)。请参阅此处有关详细信息。

接下来是观点鸿沟。还记得cw == vz吗?这是沿Z Axiz从相机到顶点的距离(相机指向的方向)。我们将每个组件除以此CW值,以获取标准化投影坐标(NPC)(NX = CX/CW,NY = CY/CW,NZ = CZ/CW,NW = CW/CW/CW = 1.0) 。所有这些值(nxnynz)都在-1和1之间,因为我们将cx&gt的所有内容夹住了; CWcx< -cwcy> CWcy< -cwcz> CWcz< -cw。再次参见在这里有关此的许多详细信息。视角鸿沟是使距离较远的事物看起来较小的原因。离相机越远,CW(VZ)越大,当我们分裂时,其X和Y坐标的越多。

最后一步是视口变换。 nx NYnz(每个范围从-1到1)转换为像素坐标。例如,nx = -1在屏幕的左侧,nx = 1在屏幕的右侧,因此我们得到sx =(nx * width/2) +(width/2)>或等效地sx =(nx+1) * width。 SY类似。您可以将SZ视为将在深度缓冲区中使用的值,因此它需要从近平面的顶点0(vz =接近)到深度缓冲区可以保持的最大值(例如2^24) = 16777216对于24位z缓冲区),用于远面平面的顶点(vz = far)。

There are 5 steps in going from "world space" (Wx,Wy,Wz) to "screen space" (Sx,Sy): View, Clipping, Projection, Perspective Divide, Viewport. This is described pretty well here but some details are glossed over. I will try to explain the steps conceptually.

Imagine you have some vertices (what we want to render), a camera (with a position and orientation - which direction it is pointing), and a screen (a rectangular grid of WIDTHxHEIGHT pixels).

The Model Matrix I think you already understand: it scales, rotates, and translates each vertex into world coordinates (Wx,Wy,Wz,1.0). The last "1.0" (sometimes called the w component) allows us to represent translation and projection (as well as scaling and rotation) as a single 4x4 matrix.

The View Matrix (aka camera matrix) moves all the vertices to the point of view of the camera. I think of it as working in 2 steps: First it translates the entire scene (all vertices including the camera) such that in the new coordinate system the camera is at the origin. Second it rotates the entire scene such that the camera is looking from the origin in the direction of the -Z axis. There is a good description of this here. (Mathematically the rotation happens first, but I find it easier to visualize if I do the translation first.) At this point each vertex is in View coordinates (Vx,Vy,Vz,1.0). A good way to visualize this is to imagine the entire scene is embedded in ice; grab the block of ice and move it so the camera is at the origin pointing along the -z axis (and all the other objects in the world move along with the ice they are embedded in).

Next, the projection matrix encodes what kind of lens (wide angle vs telephoto) the camera has; in other words how much of the world will be visible on the screen. This is described well here but here is how it is calculated:

[    near/width   ][        0        ][         0              ][        0       ]
[        0        ][    near/height  ][         0              ][        0       ]
[        0        ][        0        ][(far+near)/(far-near)   ][        1       ]
[        0        ][        0        ][-(2*near*far)/(far-near)][        0       ]

near = near plane distance (everything closer to the camera than this is clipped).
far  = far plane distance (everything farther from the camera than this is clipped).
width  = the widest object we can see if it is at the near plane.
height = the tallest object we can see if it is at the near plane.

. It results in "clip coordinates" (Cx,Cy,Cz,Cw=Vz). Note that the viewspace z coordinate (Vz) ends up in the w coordinate of the clip coordinates (Cw) (more on this below). This matrix stretches the world so that the camera's field of view is now 45 degrees up,down,left, and right. In other words, in this coordinate system if you look from the origin (camera position) straight along the -z axis (direction the camera is pointing) you will see what is in the center of the screen, and if you rotate your head up {down,left,right} you will see what will be at the top {bottom,left,right} of the screen. You can visualize this as a pyramid shape where the camera is at the top of the pyramid and the camera is looking straight down inside the pyramid. (This shape is called a "frustum" once you clip the top and bottom of the pyramid off with the near and far plane - see next paragraph.) The Cz value calculation makes vertices at the near plane have Cz=-Cw and vertices at the far plane have Cz=Cw

Clipping takes place in clip coordinates (which is why they are called that). Clipping means you take some scissors and clip away anything that is outside that pyramid shape. You also clip everything that is too close to the camera (the "near plane") and everything that is too far away from the camera (the "far plane"). See here for details.

Next comes the perspective divide. Remember that Cw == Vz? This is the distance from the camera to the vertex along the z axiz (the direction the camera is pointing). We divide each component by this Cw value to get Normalized Projection Coordinates (NPC) (Nx=Cx/Cw, Ny=Cy/Cw, Nz=Cz/Cw, Nw=Cw/Cw=1.0). All these values (Nx, Ny and Nz) will be between -1 and 1 because we clipped away anything where Cx > Cw or Cx < -Cw or Cy > Cw or Cy < -Cw or Cz > Cw or Cz < -Cw. Again see here for lots of details on this. The perspective divide is what makes things that are farther away appear smaller. The farther away from the camera something is, the larger the Cw (Vz) is, and the more its X and Y coordinate will be reduced when we divide.

The final step is the viewport transform. Nx Ny and Nz (each ranging from -1 to 1) are converted to pixel coordinates. For example Nx=-1 is at the left of the screen and Nx=1 is at the right of the screen, so we get Sx = (Nx * WIDTH/2) + (WIDTH/2) or equivalently Sx = (Nx+1) * WIDTH. Similar for Sy. You can think of Sz as the value that will be used in a depth buffer, so it needs to range from 0 for vertices at the near plane (Vz=near) to the maximum value that the depth buffer can hold (e.g. 2^24= 16777216 for a 24 bit z buffer) for vertices at the far plane (Vz=far).

最近可好 2025-01-25 15:10:24

您所说的“相机矩阵”听起来像是两个矩阵的组合:视图矩阵和投影矩阵。您可能只谈论其中之一,但尚不清楚。


视图矩阵:视图矩阵是相机模型矩阵的逆矩阵(如果您在世界中绘制的话)。为了绘制不同的摄像机角度,我们实际上将整个世界向相反的方向移动——因此只有一个摄像机角度。

通常在 OpenGL 中,相机“确实”停留在 (0,0,0) 并沿 Z 轴正方向(朝向 0,0,+∞)看。您可以对投影矩阵应用旋转以获得不同的方向,但为什么要这样做呢?在视图矩阵中完成所有旋转,您的生活会更简单。

例如,如果您希望相机位于 (0,3,0),我们不会将相机向上移动 3 个单位,而是将其保留在 (0,0,0) 并将整个世界向下移动 3 个单位。如果你想让它旋转90度,我们实际上是在相反的方向上将世界旋转90度。世界并不介意——这只是计算机中的数字——它不会感到头晕。

我们仅在渲染时执行此操作。例如,所有游戏物理计算都不是在旋转世界中完成的。当我们旋转相机时,世界上物体的坐标不会改变 - 除了渲染系统内部。通常,我们告诉 GPU 对象的正常世界坐标,然后让 GPU 使用矩阵来移动和旋转它们。


投影矩阵:你知道视锥体吗?您以前可能见过这种形状:(信用

View frustum

截断金字塔形状(截头体)内的所有内容都会显示在屏幕上。你知道这一点。

但计算机实际上并不以截锥体进行渲染。它渲染一个立方体。视图矩阵将截锥体变换为立方体。

如果您熟悉线性代数,您可能会注意到 3D 矩阵无法将立方体变成截锥体。这就是第四个坐标 (w) 的用途。计算完成后,x、y、z 坐标均除以 w。通过使用使 w 依赖于 z 的视图矩阵,远处点的坐标会除以一个更大的数字,因此它们会被推向屏幕中间 - 这就是立方体能够变成截锥体的方式。

您不必拥有平截头体 - 这就是您通过透视投影得到的东西。您还可以使用正交投影,通过不更改 w 将立方体变成立方体。

除非您想自己做大量数学运算,否则我建议您仅使用库函数来生成投影矩阵。


如果您将顶点乘以连续的多个矩阵,则将它们组合成一个矩阵,然后将顶点乘以组合矩阵会更有效 - 因此您经常会看到使用 MVP、MV 和 VP 矩阵。 (M = 模型矩阵 - 我认为这与你所说的世界矩阵是一样的)

The "camera matrix" as you called it sounds like a combination of two matrices: the view matrix and the projection matrix. It's possible you're only talking about one of these, but it's not clear.


View matrix: The view matrix is the inverse of what the camera's model matrix would be if you drew it in the world. In order to draw different camera angles, we actually move the entire world in the opposite direction - so there is only one camera angle.

Usually in OpenGL, the camera "really" stays at (0,0,0) and looks along the Z axis in the positive direction (towards 0,0,+∞). You can apply a rotation to the projection matrix to get a different direction, but why would you? Do all the rotation in the view matrix and your life is simpler.

So if you want your camera to be at (0,3,0) for example, instead of moving the camera up 3 units, we leave it at (0,0,0) and move the entire world down 3 units. If you want it to rotate 90 degrees, we actually rotate the world 90 degrees in the opposite direction. The world doesn't mind - it's just numbers in a computer - it doesn't get dizzy.

We only do this when rendering. All of the game physics calculations, for example, aren't done in the rotated world. The coordinates of the stuff in the world don't get changed when we rotate the camera - except inside the rendering system. Usually, we tell the GPU the normal world coordinates of the objects, and we get the GPU to move and rotate them for us, using a matrix.


Projection matrix: You know the view frustum? This shape you've probably seen before: (credit)

View frustum

Everything inside the cut-off pyramid shape (frustum) is displayed on the screen. You know this.

Except the computer doesn't actually render in a frustum. It renders a cube. The view matrix transforms the frustum into a cube.

If you're familiar with linear algebra, you may notice that a 3D matrix can't make a cube into a frustum. That's what the 4th coordinate (w) is for. After this calculation, the x, y and z coordinates are all divided by w. By using a view matrix that makes w depend on z, the coordinates of far-away points get divided by a larger number, so they get pushed towards the middle of the screen - that's how a cube is able to turn into a frustum.

You don't have to have a frustum - that's what you get with a perspective projection. You can also use an orthographic projection, which turns a cube into a cube, by not changing w.

Unless you want to do a bunch of math yourself, I'd recommend you just use the library functions to generate projection matrices.


If you're multiplying vertices by several matrices in a row it's more efficient to combine them into one matrix, and then multiply the vertices by the combined matrix - hence you will often see MVP, MV and VP matrices used. (M = model matrix - I think it's the same thing you called a world matrix)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文