捕获稀疏矩阵的非零元素、计数和索引
我有以下稀疏矩阵 A。
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
然后我想从那里捕获以下信息:
条目的累积计数,因为矩阵是按列扫描的。 产量:
Ap = [ 0, 2, 5, 9, 10, 12 ];
条目的行索引,因为矩阵是按列扫描的。 产量:
Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];
条目非零矩阵条目,因为矩阵是按列扫描的。 产量:
Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];
由于实际矩阵 A 可能非常大,有什么有效的方法 在 Perl 中可以捕获那些元素?特别是在不吸食所有矩阵 A 的情况下 进入内存。
我被下面的代码困住了。这并没有给出我想要的。
use strict;
use warnings;
my (@Ax, @Ai, @Ap) = ();
while (<>) {
chomp;
my @elements = split /\s+/;
my $i = 0;
my $new_line = 1;
while (defined(my $element = shift @elements)) {
$i++;
if ($element) {
push @Ax, 0 + $element;
if ($new_line) {
push @Ai, scalar @Ax;
$new_line = 0;
}
push @Ap, $i;
}
}
}
push @Ai, 1 + @Ax;
print('@Ax = [', join(" ", @Ax), "]\n");
print('@Ai = [', join(" ", @Ai), "]\n");
print('@Ap = [', join(" ", @Ap), "]\n");
I have the following sparse matrix A.
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
Then I would like to capture the following information from there:
cumulative count of entries, as matrix is scanned columnwise.
Yielding:Ap = [ 0, 2, 5, 9, 10, 12 ];
row indices of entries, as matrix is scanned columnwise.
Yielding:Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];
Non-zero matrix entries, as matrix is scanned columnwise.
Yielding:Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];
Since the actual matrix A is potentially very2 large, is there any efficient way
in Perl that can capture those elements? Especially without slurping all matrix A
into RAM.
I am stuck with the following code. Which doesn't give what I want.
use strict;
use warnings;
my (@Ax, @Ai, @Ap) = ();
while (<>) {
chomp;
my @elements = split /\s+/;
my $i = 0;
my $new_line = 1;
while (defined(my $element = shift @elements)) {
$i++;
if ($element) {
push @Ax, 0 + $element;
if ($new_line) {
push @Ai, scalar @Ax;
$new_line = 0;
}
push @Ap, $i;
}
}
}
push @Ai, 1 + @Ax;
print('@Ax = [', join(" ", @Ax), "]\n");
print('@Ai = [', join(" ", @Ai), "]\n");
print('@Ap = [', join(" ", @Ap), "]\n");
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
存储稀疏数据的常见策略是删除您不关心的值(零),并将行索引和列索引与您关心的每个值一起存储,从而保留它们的位置信息:
在您的情况下,您可以进一步节省,因为通过逐列处理数据可以满足您的所有需求,这意味着我们不必为每个值重复 COLUMN。
A common strategy for storing sparse data is to drop the values you don't care about (the zeroes) and to store the row and column indexes with each value that you do care about, thus preserving their positional information:
In your case, you can economize further since all of your needs can be met by processing the data column-by-column, which means we don't have to repeat COLUMN for every value.
我想这就是您正在寻找的:
This is what you are looking for, I guess:
根据 FM 的评论更新。如果您不想存储任何原始数据:
输出:
Updated based on FM's comment. If you do not want to store any of the original data:
Output:
Ap 很简单:只需从零开始,并在每次遇到非零数字时递增。我没有看到你试图在 @Ap 中写入任何内容,所以它最终没有如你所愿也就不足为奇了。
Ai 和 Axe 比较棘手:在按行扫描时需要按列排序。您将无法就地执行任何操作,因为您还不知道列将产生多少个元素,因此您无法提前知道元素的位置。
显然,如果您可以更改要求以按行排序,那就容易多了。否则,您可能会变得复杂并收集 (i, j, x) 三元组。在收集时,它们自然会按 (i, j) 排序。收集后,您只想按 (j, i) 对它们进行排序。
Ap is easy: simply start with zeroes and increment each time you meet a nonzero number. I don't see you trying to write anything into @Ap, so it's no surprise it doesn't end up as you wish.
Ai and Ax are trickier: you want a columnwise ordering while you're scanning rowwise. You won't be able to do anything in-place since you don't know yet how many elements the columns will yield, so you can't know in advance the elements' position.
Obviously, it would be a hell lot easier if you could just alter the requirement to have a rowwise ordering instead. Failing that, you could get complex and collect (i, j, x) triplets. While collecting, they'd naturally be ordered by (i, j). Post-collection, you'd just want to sort them by (j, i).
您提供的代码可以逐行运行。要按列顺序获取结果,您必须将值累积到单独的数组中,每一列一个数组:
另请参阅 列表::Flatten。
The code you provided works on a row-by-row basis. To get results sequential by columns you have to accumulate your values into separate arrays, one for each column:
See also List::Flatten.