表示下/上三角矩阵的有效方法
我正在 C/C++ 程序中处理我的数据,这是二维的。这里我的值是成对计算的,并且 foo[i][j]
和 foo[j][i]
的值相同。
因此,如果我使用一个简单的二维数组来实现它,我的空间将被浪费一半。那么表示这个下/上三角矩阵的最佳数据结构是什么?
问候,
I am working on my data in a C/C++ program, which is 2 dimensional. Here my value is calculated for pair-wise and here values would be same for foo[i][j]
and foo[j][i]
.
Thus if I implement it by using a simple 2 dimensional array, half of my space would be wasted. So what would be best data structure to represent this lower/upper triangular matrix.
Regards,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果有 N 个项目,则没有主对角线的下三角数组将具有 (N - 1) * N / 2 个元素,或具有主对角线的 (N + 1) * N / 2 个元素。没有主对角线,(I, J) (I,J ∈ 0..N-1, I > J) ⇒ (I * (I - 1) / 2 + J)。对于主对角线,(I,J ∈ 0..N-1, I ≥ J) ⇒ ((I + 1) * I / 2 + J)。
(是的,当您在 2.5 GB 的计算机上分配 4 GB 时,将其减少一半确实会产生巨大的差异。)
If you have N items then a lower triangular array without the main diagonal will have (N - 1) * N / 2 elements, or (N + 1) * N / 2 elements with the main diagonal. Without the main diagonal, (I, J) (I,J ∈ 0..N-1, I > J) ⇒ (I * (I - 1) / 2 + J). With the main diagonal, (I,J ∈ 0..N-1, I ≥ J) ⇒ ((I + 1) * I / 2 + J).
(And yes, when you're allocating 4 gigabytes on a 2.5 gigabyte machine, cutting it half does make a huge difference.)
实际上,你最好只使用常规的二维矩阵。内存相当便宜。如果您确实不想这样做,那么您可以构建一个具有正确数量元素的一维数组,然后弄清楚如何访问每个元素。例如,如果数组的结构如下:
并且将其存储为一维数组,从左到右,您将访问元素
C
(2, 2)
与数组[3]
。您可以编写一个从[i][j]
到[n]
的函数,但我不会破坏您的乐趣。但你不必这样做,除非所讨论的三角形阵列真的很大或者你非常关心空间。Really, you're best off just using a regular two dimensional matrix. RAM is pretty cheap. If you really don't want to do that, then you can build a one-dimensional array with the right number of elements and then figure out how to access each element. For example, if the array is structured like this:
and you have it stored as a one dimensional array, left to right, you'd access element
C
(2, 2)
witharray[3]
. You can work out a function to go from[i][j]
to[n]
but I won't spoil your fun. But you don't have to do this unless the triangular array in question is really huge or you're very concerned about space.正如 Dan 和 Praxeolitic 提出的具有对角线但具有修正的转移规则的下三角矩阵。
对于矩阵 n × n,您需要数组
(n+1)*n/2
长度,转换规则为Matrix[i][j] = Array[i*(i+1)/ 2+j]
。我用
double
来做,但你可以把它做成template
。这只是基本框架,所以不要忘记实现析构函数。As Dan and Praxeolitic proposed for lower triangular matrix with diagonal but with corrected transition rule.
For matrix n by n you need array
(n+1)*n/2
length and transition rule isMatrix[i][j] = Array[i*(i+1)/2+j]
.I did it with
double
but you can make it astemplate
. This is just basic skeleton so don't forget to implement destructor.使用锯齿状数组:
它将创建类似的数组
Use a jagged array:
it will create array like
需要在 n × n 对称矩阵中表示的唯一元素的数量 m:
对于主对角线
m = (n*(n + 1))/2
没有对角线(对于 OP 描述的对称矩阵,需要主对角线,但只是为了更好的测量......)
m = (n*(n - 1))/2
。如果使用带截断的整数算术,则直到最后一个运算才除以 2 是很重要的。
您还需要执行一些算术来查找与对角矩阵中的 x 行和 y 列相对应的已分配内存中的索引 i。
上对角矩阵中 x 行和 y 列的已分配内存 i 中的索引:
有对角线
没有对角线
对于下对角矩阵,翻转方程中的 x 和 y。对于对称矩阵,只需在内部选择 x>=y 或 y>=x,并根据需要翻转成员函数。
The number of unique elements, m, needed to be represented in an n by n symmetric matrix:
With the main diagonal
m = (n*(n + 1))/2
Without the diagonal (for symmetric matrix as the OP describes, main diagonal is needed, but just for good measure...)
m = (n*(n - 1))/2
.Not dividing by 2 until the last operation is important if integer arithmetic with truncation is used.
You also need to do some arithmetic to find the index, i, in the allocated memory corresponding to row x and column y in the diagonal matrix.
Index in allocated memory, i, of row x and column y in upper diagonal matrix:
With the diagonal
Without the diagonal
For a lower diagonal matrix flip x and y in the equations. For a symmetric matrix just choose either x>=y or y>=x internally and have member functions flip as needed.
重复 Dani 的答案...
您可以分配一个数组来保存数据,并分配一个小数组来保存指向第一个数组中的行的指针,而不是分配许多不同大小的数组,这可能会导致内存碎片或奇怪的缓存访问模式分配。
现在您可以使用
table
就好像它是一个锯齿状数组,如 Dani 的答案所示:但所有数据都在一个块中,否则它可能不会取决于您的分配器的策略。
使用行指针表可能会或可能不会比使用 Praxeolitic 公式计算偏移量更快。
Riffing on Dani's answer...
Instead of allocating many arrays of various sizes, which could lead to memory fragmentation or weird cache access patterns, you could allocate one array to hold the data and one small array to hold pointers to the rows within the first allocation.
Now you can use
table
as though it was a jagged array as shown in Dani's answer:But all the data is in a single block, which it might not otherwise be depending on your allocator's strategy.
Using the table of row pointers may or may not be faster than computing the offset using Praxeolitic's formula.