捕获稀疏矩阵的非零元素、计数和索引

发布于 2024-08-03 00:31:15 字数 1159 浏览 5 评论 0原文

我有以下稀疏矩阵 A。

   2   3   0   0   0
   3   0   4   0   6
   0  -1  -3   2   0
   0   0   1   0   0
   0   4   2   0   1

然后我想从那里捕获以下信息：

条目的累积计数，因为矩阵是按列扫描的。产量：
Ap = [ 0, 2, 5, 9, 10, 12 ];
条目的行索引，因为矩阵是按列扫描的。产量：
Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];
条目
非零矩阵条目，因为矩阵是按列扫描的。产量：
Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];

由于实际矩阵 A 可能非常大，有什么有效的方法在 Perl 中可以捕获那些元素？特别是在不吸食所有矩阵 A 的情况下进入内存。

我被下面的代码困住了。这并没有给出我想要的。

use strict;
use warnings;

my (@Ax, @Ai, @Ap) = ();
while (<>) {
    chomp;
    my @elements = split /\s+/;
    my $i = 0;
    my $new_line = 1;
    while (defined(my $element = shift @elements)) {
        $i++;
        if ($element) {
            push @Ax, 0 + $element;
            if ($new_line) {
                push @Ai, scalar @Ax;
                $new_line = 0;
            }

            push @Ap, $i;
        }
    }
}
push @Ai, 1 + @Ax;
print('@Ax  = [', join(" ", @Ax), "]\n");
print('@Ai = [', join(" ", @Ai), "]\n");
print('@Ap = [', join(" ", @Ap), "]\n");

原文

I have the following sparse matrix A.

   2   3   0   0   0
   3   0   4   0   6
   0  -1  -3   2   0
   0   0   1   0   0
   0   4   2   0   1

Then I would like to capture the following information from there:

cumulative count of entries, as matrix is scanned columnwise.
Yielding:
Ap = [ 0, 2, 5, 9, 10, 12 ];
row indices of entries, as matrix is scanned columnwise.
Yielding:
Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];
Non-zero matrix entries, as matrix is scanned columnwise.
Yielding:
Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];

Since the actual matrix A is potentially very2 large, is there any efficient way
in Perl that can capture those elements? Especially without slurping all matrix A
into RAM.

I am stuck with the following code. Which doesn't give what I want.

use strict;
use warnings;

my (@Ax, @Ai, @Ap) = ();
while (<>) {
    chomp;
    my @elements = split /\s+/;
    my $i = 0;
    my $new_line = 1;
    while (defined(my $element = shift @elements)) {
        $i++;
        if ($element) {
            push @Ax, 0 + $element;
            if ($new_line) {
                push @Ai, scalar @Ax;
                $new_line = 0;
            }

            push @Ap, $i;
        }
    }
}
push @Ai, 1 + @Ax;
print('@Ax  = [', join(" ", @Ax), "]\n");
print('@Ai = [', join(" ", @Ai), "]\n");
print('@Ap = [', join(" ", @Ap), "]\n");

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情绪少女 2024-08-10 00:31:15

存储稀疏数据的常见策略是删除您不关心的值（零），并将行索引和列索引与您关心的每个值一起存储，从而保留它们的位置信息：

[VALUE, ROW, COLUMN]

在您的情况下，您可以进一步节省，因为通过逐列处理数据可以满足您的所有需求，这意味着我们不必为每个值重复 COLUMN。

use strict;
use warnings;
use Data::Dumper;

my ($r, $c, @dataC, @Ap, @Ai, @Ax, $cumul);

# Read data row by row, storing non-zero values by column.
#    $dataC[COLUMN] = [
#        [VALUE, ROW],
#        [VALUE, ROW],
#        etc.
#    ]
$r = -1;
while (<DATA>) {
    chomp;
    $r ++;
    $c = -1;
    for my $v ( split '\s+', $_ ){
        $c ++;
        push @{$dataC[$c]}, [$v, $r] if $v;
    }
}

# Iterate through the data column by column
# to compute the three result arrays.
$cumul = 0;
@Ap = ($cumul);
$c = -1;
for my $column (@dataC){
    $c ++;
    $cumul += @$column;
    push @Ap, $cumul;
    for my $value (@$column){
        push @Ax, $value->[0];
        push @Ai, $value->[1];
    }
}

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1

A common strategy for storing sparse data is to drop the values you don't care about (the zeroes) and to store the row and column indexes with each value that you do care about, thus preserving their positional information:

[VALUE, ROW, COLUMN]

In your case, you can economize further since all of your needs can be met by processing the data column-by-column, which means we don't have to repeat COLUMN for every value.

use strict;
use warnings;
use Data::Dumper;

my ($r, $c, @dataC, @Ap, @Ai, @Ax, $cumul);

# Read data row by row, storing non-zero values by column.
#    $dataC[COLUMN] = [
#        [VALUE, ROW],
#        [VALUE, ROW],
#        etc.
#    ]
$r = -1;
while (<DATA>) {
    chomp;
    $r ++;
    $c = -1;
    for my $v ( split '\s+', $_ ){
        $c ++;
        push @{$dataC[$c]}, [$v, $r] if $v;
    }
}

# Iterate through the data column by column
# to compute the three result arrays.
$cumul = 0;
@Ap = ($cumul);
$c = -1;
for my $column (@dataC){
    $c ++;
    $cumul += @$column;
    push @Ap, $cumul;
    for my $value (@$column){
        push @Ax, $value->[0];
        push @Ai, $value->[1];
    }
}

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1

回复收藏 0 原文

你怎么这么可爱啊 2024-08-10 00:31:15

我想这就是您正在寻找的：

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper::Simple;

my @matrix;

# Populate @matrix
while (<>) {
    push @matrix, [ split /\s+/ ];
}

my $columns = @{ $matrix[0] };
my $rows    = @matrix;

my ( @Ap, @Ai, @Ax );
my $ap = 0;

for ( my $j = 0 ; $j <= $rows ; $j++ ) {
    for ( my $i = 0 ; $i <= $columns ; $i++ ) {
        if ( $matrix[$i]->[$j] ) {
            $ap++;
            push @Ai, $i;
            push @Ax, $matrix[$i]->[$j];
        }
    }
    push @Ap, $ap;
}

print Dumper @Ap;
print Dumper @Ai;
print Dumper @Ax;

This is what you are looking for, I guess:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper::Simple;

my @matrix;

# Populate @matrix
while (<>) {
    push @matrix, [ split /\s+/ ];
}

my $columns = @{ $matrix[0] };
my $rows    = @matrix;

my ( @Ap, @Ai, @Ax );
my $ap = 0;

for ( my $j = 0 ; $j <= $rows ; $j++ ) {
    for ( my $i = 0 ; $i <= $columns ; $i++ ) {
        if ( $matrix[$i]->[$j] ) {
            $ap++;
            push @Ai, $i;
            push @Ax, $matrix[$i]->[$j];
        }
    }
    push @Ap, $ap;
}

print Dumper @Ap;
print Dumper @Ai;
print Dumper @Ax;

回复收藏 0 原文

屌丝范 2024-08-10 00:31:15

根据 FM 的评论更新。如果您不想存储任何原始数据：

#!/usr/bin/perl

use strict;
use warnings;

my %matrix_info;

while ( <DATA> ) {
    chomp;
    last unless /[0-9]/;
    my @v = map {0 + $_ } split;
    for (my $i = 0; $i < @v; ++$i) {
        if ( $v[$i] ) {
            push @{ $matrix_info{$i}->{indices} }, $. - 1;
            push @{ $matrix_info{$i}->{nonzero} }, $v[$i];
        }
    }
}

my @cum_count = (0);
my @row_indices;
my @nonzero;

for my $i ( sort {$a <=> $b } keys %matrix_info ) {
    my $mi = $matrix_info{$i};
    push @nonzero, @{ $mi->{nonzero} };

    my @i = @{ $mi->{indices} };

    push @cum_count, $cum_count[-1] + @i;
    push @row_indices, @i;
}

print(
    "\@Ap = [@cum_count]\n",
    "\@Ai = [@row_indices]\n",
    "\@Ax = [@nonzero]\n",
);

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1

输出：

C:\Temp> m
@Ap = [0 2 5 9 10 12]
@Ai = [0 1 0 2 4 1 2 3 4 2 1 4]
@Ax = [2 3 3 -1 4 4 -3 1 2 2 6 1]

Updated based on FM's comment. If you do not want to store any of the original data:

#!/usr/bin/perl

use strict;
use warnings;

my %matrix_info;

while ( <DATA> ) {
    chomp;
    last unless /[0-9]/;
    my @v = map {0 + $_ } split;
    for (my $i = 0; $i < @v; ++$i) {
        if ( $v[$i] ) {
            push @{ $matrix_info{$i}->{indices} }, $. - 1;
            push @{ $matrix_info{$i}->{nonzero} }, $v[$i];
        }
    }
}

my @cum_count = (0);
my @row_indices;
my @nonzero;

for my $i ( sort {$a <=> $b } keys %matrix_info ) {
    my $mi = $matrix_info{$i};
    push @nonzero, @{ $mi->{nonzero} };

    my @i = @{ $mi->{indices} };

    push @cum_count, $cum_count[-1] + @i;
    push @row_indices, @i;
}

print(
    "\@Ap = [@cum_count]\n",
    "\@Ai = [@row_indices]\n",
    "\@Ax = [@nonzero]\n",
);

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1

Output:

C:\Temp> m
@Ap = [0 2 5 9 10 12]
@Ai = [0 1 0 2 4 1 2 3 4 2 1 4]
@Ax = [2 3 3 -1 4 4 -3 1 2 2 6 1]

回复收藏 0 原文

毁虫ゝ 2024-08-10 00:31:15

Ap 很简单：只需从零开始，并在每次遇到非零数字时递增。我没有看到你试图在 @Ap 中写入任何内容，所以它最终没有如你所愿也就不足为奇了。

Ai 和 Axe 比较棘手：在按行扫描时需要按列排序。您将无法就地执行任何操作，因为您还不知道列将产生多少个元素，因此您无法提前知道元素的位置。

显然，如果您可以更改要求以按行排序，那就容易多了。否则，您可能会变得复杂并收集 (i, j, x) 三元组。在收集时，它们自然会按 (i, j) 排序。收集后，您只想按 (j, i) 对它们进行排序。

回复收藏 0 原文

孤独难免 2024-08-10 00:31:15

您提供的代码可以逐行运行。要按列顺序获取结果，您必须将值累积到单独的数组中，每一列一个数组：

# will look like ([], [], [] ...), one [] for each column.
my @columns;

while (<MATRIX>) {
    my @row = split qr'\s+';
    for (my $col = 0; $col < @row; $col++) {

        # push each non-zero value into its column
        push @{$columns[$col]}, $row[$col] if $row[$col] > 0;

    }
}

# now you only need to flatten it to get the desired kind of output:
use List::Flatten;
@non_zero = flat @columns;

另请参阅列表::Flatten。

The code you provided works on a row-by-row basis. To get results sequential by columns you have to accumulate your values into separate arrays, one for each column:

# will look like ([], [], [] ...), one [] for each column.
my @columns;

while (<MATRIX>) {
    my @row = split qr'\s+';
    for (my $col = 0; $col < @row; $col++) {

        # push each non-zero value into its column
        push @{$columns[$col]}, $row[$col] if $row[$col] > 0;

    }
}

# now you only need to flatten it to get the desired kind of output:
use List::Flatten;
@non_zero = flat @columns;