API不可知论行/Colum-Major矩阵表示

发布于 2025-01-23 21:31:44 字数 1471 浏览 3 评论 0原文

因为在d3d/hlsl中，我们使用行 - 矢量（1xn矩阵），因此请先使用（vector * pector *矩阵），我们将翻译部分存储在矩阵的第4行中：

m00 m01 m02 0
m10 m11 m12 0
m20 m21 m22 0
Tx  Ty  Tz  1

因此，转换的x坐标为x'= x*m00 + y*m10 + z*m20 + 1*tx。如果矩阵仅包含翻译，则转化为x'= x*1 + 1*tx = x + tx。

基于 HLSL文档和过去的一些实验，统一矩阵由列加载，因此一个寄存器将包含{M00，M10，M20，M20，M20，M30}。这对矢量 - 矩阵乘法很有好处，因为它可以转换为4个点产品（x'= Vec-Registry dot dot mat-registry_0），它可能具有单个硬件指令。

另一方面，OGL/GLSL使用列 - 矢量和后刷后，这意味着翻译部分存储在第四列中（上面的矩阵的转置）。基于Wiki ”。

一些关键点：

在矩阵操作的CPU侧，我想对它们进行矢量化
，这意味着内存布局很重要，因此可以将适当的列加载到Reigsters Fast中
，当前很快将我的矩阵存储在列{ M00，M10，M20，M30，M01，...，M33}其中M30，M31，M32，M32存储翻译部分（我使用的是行矢量 - ＆gt; pre -multiplication ）
此外，我想使用相同的内存布局将统一数据传递给图形API（memcpy到不转移矩阵而不转移矩阵的情况下），
我检查了UE4的矩阵实现基础渲染API（这是预期结果）

以API不可知论方式处理这些差异的最佳方法是什么？

我想我可以保留当前的矩阵实现（DX风格，行矢量，预密封性，SSE的列列存储），在GLSL代码中，我设置了row_major layout因此，它以相反的方式读取数据。如果确实有效，那么这种性能会带来什么影响（如果有）？

主要目标平台是Vulkan和D3D11/12。

原文

Because in D3D/HLSL we use row-vectors (1xN matrices) thus pre-multiplication (vector * matrix), we store the translation part in the 4th row of the matrix:

m00 m01 m02 0
m10 m11 m12 0
m20 m21 m22 0
Tx  Ty  Tz  1

so the transformed x coordinate is x' = x*m00 + y*m10 + z*m20 + 1*Tx. If the matrix contains a translation only then it translates into x' = x*1 + 1*Tx = x + Tx.

Based on the HLSL docs and some experiments in the past, the uniform matrices are loaded by columns, so one register will contain {m00,m10,m20,m30}. And this is good for the vector-matrix multiplication because it translates into 4 dot products (x' = vec-registry dot mat-registry_0) which probably has a single hardware instruction.

On the other hand, OGL/GLSL uses column-vectors and post-multiplication, which means that the translation part is stored in the 4th column instead (transpose of the matrix above). Based on the wiki the "GLSL matrices are always column-major".

Some key points:

In the CPU side of the matrix operations, I'd like to vectorize them
This means that the memory layout is important, so the appropriate columns can be loaded into reigsters fast
Currently my matrix is stored by columns {m00,m10,m20,m30,m01,...,m33} where m30,m31,m32 stores the translation part (I'm using row vectors -> pre-multiplication)
Additionally, I'd like to use the same memory layout for passing uniform data to the graphics API (memcpy into a buffer without transposing the matrix)
I examined the Matrix implementation of the UE4 and I can't see any separation based on the underlying rendering API (which is the expected result)

What is the best way to handle these differences in an API agnostic way?

I'd imagine that I could keep my current matrix implementation (DX-style, row-vectors, pre-multiplication, column-major storage for SSE) and in the GLSL code, I set the row_major layout so it reads data in the other way around. If it does work, what is the performance impact of this, if any?

The main target platforms are Vulkan and D3D11/12.

分享到QQ

分享到微博