对存储在哈希中的 AoA 进行操作。 PDL 与无 PDL

发布于 2024-11-16 06:31:12 字数 1458 浏览 6 评论 0原文

我有一个 AoAs 的哈希值：

$hash{$key} = [ 
               [0.0,1.0,2.0],
               10.0,
               [1.5,9.5,5.5],
              ];

我需要按如下方式进行处理：

$err += (($hash{$key}[0][$_]-$hash{key}[2][$_])*$hash{$key}[1])**2 foreach (0 .. 2);

计算两个数组之间的平方加权差。由于我的哈希值很大，我希望 PDL 能够帮助加快计算速度，但由于某种原因却没有。我还是 PDL 新手，所以我可能搞砸了一些事情。下面使用 PDL 的脚本大约慢 10 倍。描述：以下两个脚本是我尝试简单地表示我的程序中发生的情况。我将一些参考值读入哈希值，然后将观察结果（即时拉入哈希值）与这些值进行多次比较并具有一定的权重。在脚本中，我将参考数组、权重和观察数组设置为一些任意固定值，但在运行时情况并非如此。

这里有两个简单的脚本，不带 PDL 和带 PDL：

不带 PDL

use strict;
use warnings;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_} = [
               [0.000, 1.000, 2.0000],
               10.0,
               [1.5,9.5,5.5],
              ];
  foreach my $i (0 .. 2){
    $error += (($hash{$_}[0][$i]-$hash{$_}[2][$i])*$hash{$_}[1])**2;
  }
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1,$error);

带 PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_}[0] = pdl[0.000, 1.000, 2.0000];
  $hash{$_}[1] = pdl[10.0];
  $hash{$_}[2] = pdl[1.5,9.5,5.5];
  my $e = ($hash{$_}[0]-$hash{$_}[2])*$hash{$_}[1];
  $error += inner($e,$e);
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);

原文

I have a hash of AoAs:

$hash{$key} = [ 
               [0.0,1.0,2.0],
               10.0,
               [1.5,9.5,5.5],
              ];

that I need to crunch as follows:

$err += (($hash{$key}[0][$_]-$hash{key}[2][$_])*$hash{$key}[1])**2 foreach (0 .. 2);

calculating the squared weighted difference between the two arrays. Since my hash is large, I was hoping PDL would help speed up the calculation, but it doesn't for some reason. I'm still new to PDL so I'm probably messing something up. the script below with PDL is ~10 times slower. Description: The following two scripts are my attempt to represent, simply, what is going of in my program. I read in some reference values into the hash, and then I compare observations (pulled into the hash on the fly) to those values a bunch of times with some weight. In the scripts, I set the reference array, weight, and observation array to some arbitrary fixed values, but that won't be the case at run time.

here are two simple scripts without and with PDL:

without PDL

use strict;
use warnings;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_} = [
               [0.000, 1.000, 2.0000],
               10.0,
               [1.5,9.5,5.5],
              ];
  foreach my $i (0 .. 2){
    $error += (($hash{$_}[0][$i]-$hash{$_}[2][$i])*$hash{$_}[1])**2;
  }
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1,$error);

with PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_}[0] = pdl[0.000, 1.000, 2.0000];
  $hash{$_}[1] = pdl[10.0];
  $hash{$_}[2] = pdl[1.5,9.5,5.5];
  my $e = ($hash{$_}[0]-$hash{$_}[2])*$hash{$_}[1];
  $error += inner($e,$e);
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

番薯 2024-11-23 06:31:12

PDL 经过优化以处理数组计算。您正在对数据使用哈希，但由于键是数字，因此可以根据 PDL 数组对象重新编写它，从而在性能方面取得巨大胜利。以下所有 PDL 版本的示例代码的运行速度比原始无 PDL 代码快约 36 倍（比原始快 300 倍）与 PDL 代码）。

所有 PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

my $pdl0 = zeros(3,10001);  # create a [3,10001] pdl
$pdl0 .= pdl[0.000, 1.000, 2.0000];

my $pdl1 = zeros(1,10001);  # create a [1,10001] pdl
$pdl1 .= pdl[10.0];

my $pdl2 = zeros(3,10001);  # create a [3,10001] pdl
$pdl2 .= pdl[1.5,9.5,5.5];

my $e = ($pdl0 - $pdl2)*$pdl1;
$error = sum($e*$e);

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);

有关使用 PDL 的深入介绍，请参阅PDL 书籍用于计算。 PDL 主页也是了解所有 PDL 内容的良好起点。

PDL is optimized to handle array computations. You are using a hash for your data but since the keys are numbers, it can be reformulated in terms of PDL array objects for a big win in performance. The following all PDL version of the example code runs about 36X faster than the original without PDL code (and 300X faster than the original with PDL code).

all PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

my $pdl0 = zeros(3,10001);  # create a [3,10001] pdl
$pdl0 .= pdl[0.000, 1.000, 2.0000];

my $pdl1 = zeros(1,10001);  # create a [1,10001] pdl
$pdl1 .= pdl[10.0];

my $pdl2 = zeros(3,10001);  # create a [3,10001] pdl
$pdl2 .= pdl[1.5,9.5,5.5];

my $e = ($pdl0 - $pdl2)*$pdl1;
$error = sum($e*$e);

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);

See the PDL Book for an in-depth intro to using PDL for computation. The PDL homepage is also a good starting point for all things PDL.

回复收藏 0 原文

无名指的心愿 2024-11-23 06:31:12

首先，除非数组很大，否则 PDL 不会有太大帮助。因此，您是否可以创建七个 PDL 向量，每个向量包含 10001 个元素，并使用向量运算对这些向量进行运算，而不是使用索引为 0 到 10000 的哈希值（基本上）有 7 个标量元素？

其次，每次命名表达式 $hash{$_} 时都会对它进行求值，因此您应该将其分解出来。例如，在标准 Perl 代码中，您应该这样做：

my $vec = $hash{$_};
foreach my $i (0 .. 2){
    $error += (($vec->[0][$i]-$vec->[2][$i])*$vec->[1])**2;
}

First, PDL is not going to help much unless the arrays are large. So instead of using a hash indexed by 0 to 10000, each with (basically) seven scalar elements, can you instead create seven PDL vectors of 10001 elements each and operate on those using vector operations?

Second, the expression $hash{$_} is being evaluated every time you name it, so you should factor it out. In your standard Perl code, for instance, you should do this:

my $vec = $hash{$_};
foreach my $i (0 .. 2){
    $error += (($vec->[0][$i]-$vec->[2][$i])*$vec->[1])**2;
}

回复收藏 0 原文

乖乖 2024-11-23 06:31:12

我多次重构了你的代码，首先将尽可能多的复杂性移到循环之外。其次，我删除了一层左右的抽象。这大大简化了表达式，在我的系统上将运行时间缩短了约 60%，同时保持了相同的结果。

use Modern::Perl;
use Time::HiRes qw(time);

my $t1 = time;
my $error = 0;

my @foo = ( 0.000, 1.000, 2.0000 );
my $bar = 10.0;
my @baz = ( 1.5, 9.5, 5.5 );

foreach ( 0 .. 10000 ) {
    $error += ( ( $foo[$_] - $baz[$_] ) * $bar )**2 for 0 .. 2
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1,$error);

这只是普通的老式 Perl；没有 PDL。希望这对您的项目有帮助。

顺便说一句，在计算一段代码运行所需的时间时，我碰巧更喜欢 Benchmark 模块，及其 timethis()、timethese() 和 cmpthese() 函数。您可以从中获得更多信息。

I refactored your code several times over, first moving as much complexity outside of the loop as possible. Second, I removed a layer or so of abstraction. This simplified the expression considerably, and cut the runtime by about 60% on my system while maintaining the same result.

use Modern::Perl;
use Time::HiRes qw(time);

my $t1 = time;
my $error = 0;

my @foo = ( 0.000, 1.000, 2.0000 );
my $bar = 10.0;
my @baz = ( 1.5, 9.5, 5.5 );

foreach ( 0 .. 10000 ) {
    $error += ( ( $foo[$_] - $baz[$_] ) * $bar )**2 for 0 .. 2
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1,$error);

This is just plain old Perl; no PDL. Hopefully this is helpful to your project.

By the way, when calculating the time it takes for a section of code to run, I happen to prefer the Benchmark module, with its timethis(), timethese(), and cmpthese() functions. You get more information out of it.

回复收藏 0 原文

私野 2024-11-23 06:31:12

根据 Nemo 的建议，这里有一个 PDL 脚本，它在速度上取得了一定的进步。我仍然是 PDL 绿色，所以可能有更好的方法。我还将向散列添加的值拆分为引用/权重和观察的循环，以使 OP 更像更大的程序中发生的情况，请参阅上面的“描述”。

use strict;
use warnings;
use PDL;
use PDL::NiceSlice;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $nvals=10000;

#construct hash of references and weights
foreach (0 .. $nvals){
  $hash{$_} = [
                 [0.000, 1.000, 2.0000],
                 [10.0, 10.0, 10.0],
               ];
}

#record observations
foreach (0 .. $nvals){
  $hash{$_}[2] = [1.5,9.5,5.5]; 
}

my $tset = time;

my @ref;
my @obs;
my @w;

foreach (0 .. $nvals){
  my $mat = $hash{$_};
  push @ref, @{$mat->[0]};
  push @w,   @{$mat->[1]};
  push @obs, @{$mat->[2]};
}

my $ref = pdl[@ref];
my $obs = pdl[@obs];
my $w   = pdl[@w];

my $diff = (($ref-$obs)*$w)**2;
my $error = sum($diff);

my $t2 = time;

printf ( "$nvals time setup: %10.4f crunch: %10.4f total: %10.4f error: %10.4f\n", $tset-$t1,$t2-$tset, $t2-$t1,$error);

Based on Nemo's suggestion, here is a PDL script that achieves modest gains in speed. I'm still PDL green so there probably is a better way. I also split up adding values to the hash into loops for references/weights and observations to make OP more like what is happening in the bigger program, see "description" above.

use strict;
use warnings;
use PDL;
use PDL::NiceSlice;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $nvals=10000;

#construct hash of references and weights
foreach (0 .. $nvals){
  $hash{$_} = [
                 [0.000, 1.000, 2.0000],
                 [10.0, 10.0, 10.0],
               ];
}

#record observations
foreach (0 .. $nvals){
  $hash{$_}[2] = [1.5,9.5,5.5]; 
}

my $tset = time;

my @ref;
my @obs;
my @w;

foreach (0 .. $nvals){
  my $mat = $hash{$_};
  push @ref, @{$mat->[0]};
  push @w,   @{$mat->[1]};
  push @obs, @{$mat->[2]};
}

my $ref = pdl[@ref];
my $obs = pdl[@obs];
my $w   = pdl[@w];

my $diff = (($ref-$obs)*$w)**2;
my $error = sum($diff);

my $t2 = time;

printf ( "$nvals time setup: %10.4f crunch: %10.4f total: %10.4f error: %10.4f\n", $tset-$t1,$t2-$tset, $t2-$t1,$error);

回复收藏 0 原文

~没有更多了~