从 Perl 子例程返回整个数组是否效率低下?

发布于 2024-07-13 10:47:55 字数 443 浏览 12 评论 0原文

我经常在 Perl 中使用一个子例程,用一些信息填充数组。 因为我也习惯在 C++ 中进行黑客攻击,所以我发现自己经常在 Perl 中这样做,使用引用:

my @array;
getInfo(\@array);

sub getInfo {
   my ($arrayRef) = @_;
   push @$arrayRef, "obama";
   # ...
}

而不是更简单的版本:

my @array = getInfo();

sub getInfo {
   my @array;
   push @array, "obama";
   # ...
   return @array;
}

当然,原因是我不想创建数组本地在子例程中,然后在返回时复制。

是对的吗? 或者 Perl 是否会优化它?

I often have a subroutine in Perl that fills an array with some information. Since I'm also used to hacking in C++, I find myself often do it like this in Perl, using references:

my @array;
getInfo(\@array);

sub getInfo {
   my ($arrayRef) = @_;
   push @$arrayRef, "obama";
   # ...
}

instead of the more straightforward version:

my @array = getInfo();

sub getInfo {
   my @array;
   push @array, "obama";
   # ...
   return @array;
}

The reason, of course, is that I don't want the array to be created locally in the subroutine and then copied on return.

Is that right? Or does Perl optimize that away anyway?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

冷月断魂刀 2024-07-20 10:47:56

如果我查看您的示例并思考您想要做什么,我习惯于以这种方式编写它:

sub getInfo {
  my @array;
  push @array, 'obama';
  # ...
  return \@array;
}

当我需要返回大量数据时,在我看来这是直接版本。 正如您在第一个代码片段中编写的那样,不需要在 sub 之外分配数组,因为 my 会为您做这件事。 无论如何,你不应该像Leon Timmermans那样过早优化建议

If I look at your example and think about what you want to do I'm used to write it in this manner:

sub getInfo {
  my @array;
  push @array, 'obama';
  # ...
  return \@array;
}

It seems to me as straightforward version when I need return large amount of data. There is not need to allocate array outside sub as you written in your first code snippet because my do it for you. Anyway you should not do premature optimization as Leon Timmermans suggest.

玩世 2024-07-20 10:47:56

回答最后的思考,不,Perl 并没有对此进行优化。 事实上,它不能,因为返回数组和返回标量是根本不同的。

如果您正在处理大量数据或者性能是主要考虑因素,那么您的 C 习惯将对您很有帮助 - 传递和返回对数据结构的引用而不是结构本身,这样就不需要复制它们。 但是,正如 Leon Timmermans 指出的那样,绝大多数时候,您处理的数据量较小,而且性能并不是什么大问题,因此请以最易读的方式进行操作。

To answer the final rumination, no, Perl does not optimize this away. It can't, really, because returning an array and returning a scalar are fundamentally different.

If you're dealing with large amounts of data or if performance is a major concern, then your C habits will serve you well - pass and return references to data structures rather than the structures themselves so that they won't need to be copied. But, as Leon Timmermans pointed out, the vast majority of the time, you're dealing with smaller amounts of data and performance isn't that big a deal, so do it in whatever way seems most readable.

南薇 2024-07-20 10:47:56

这是我通常返回数组的方式。

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return @array if wantarray;
  return \@array;
}

这样,它将在标量或列表上下文中按照您想要的方式工作。

my $array = getInfo;
my @array = getInfo;

$array->[0] == $array[0];

# same length
@$array == @array;

除非您知道它是代码中速度较慢的部分,否则我不会尝试优化它。 即使那样,我也会使用基准测试来查看哪个子例程实际上更快。

This is the way I would normally return an array.

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return @array if wantarray;
  return \@array;
}

This way it will work the way you want, in scalar, or list contexts.

my $array = getInfo;
my @array = getInfo;

$array->[0] == $array[0];

# same length
@$array == @array;

I wouldn't try to optimize it unless you know it is a slow part of your code. Even then I would use benchmarks to see which subroutine is actually faster.

相对绾红妆 2024-07-20 10:47:56

有两个考虑。 显而易见的一个问题是您的阵列将有多大? 如果它少于几十个元素,那么大小就不是一个因素(除非您对某些快速调用的函数进行微优化,但您必须首先进行一些内存分析来证明这一点)。

这是最简单的部分。 经常被忽视的第二个考虑因素是界面。 返回的数组将如何使用? 这很重要,因为整个数组取消引用在 Perl 中有点糟糕。 例如:

for my $info (@{ getInfo($some, $args) }) {
    ...
}

那太丑了。 这好多了。

for my $info ( getInfo($some, $args) ) {
    ...
}

它还适用于映射和 grep。

my @info = grep { ... } getInfo($some, $args);

但是,如果您要挑选单个元素,则返回数组引用会很方便:

my $address = getInfo($some, $args)->[2];

这比:

my $address = (getInfo($some, $args))[2];

或者:

my @info = getInfo($some, $args);
my $address = $info[2];

但在这一点上,您应该质疑 @info 是否真正是列表或散列。

my $address = getInfo($some, $args)->{address};

您不应该做的是让 getInfo() 在标量上下文中返回一个数组引用,在列表上下文中返回一个数组。 这混淆了标量上下文作为数组长度的传统使用,这会让用户感到惊讶。

最后,我将插入我自己的模块 Method::Signatures,因为它提供了传递数组引用而不必使用数组引用语法的折衷方案。

use Method::Signatures;

method foo(\@args) {
    print "@args";      # @args is not a copy
    push @args, 42;   # this alters the caller array
}

my @nums = (1,2,3);
Class->foo(\@nums);   # prints 1 2 3
print "@nums";        # prints 1 2 3 42

这是通过 Data::Alias 的魔力来完成的。

There's two considerations. The obvious one is how big is your array going to get? If it's less than a few dozen elements, then size is not a factor (unless you're micro-optimizing for some rapidly called function, but you'd have to do some memory profiling to prove that first).

That's the easy part. The oft overlooked second consideration is the interface. How is the returned array going to be used? This is important because whole array dereferencing is kinda awful in Perl. For example:

for my $info (@{ getInfo($some, $args) }) {
    ...
}

That's ugly. This is much better.

for my $info ( getInfo($some, $args) ) {
    ...
}

It also lends itself to mapping and grepping.

my @info = grep { ... } getInfo($some, $args);

But returning an array ref can be handy if you're going to pick out individual elements:

my $address = getInfo($some, $args)->[2];

That's simpler than:

my $address = (getInfo($some, $args))[2];

Or:

my @info = getInfo($some, $args);
my $address = $info[2];

But at that point, you should question whether @info is truly a list or a hash.

my $address = getInfo($some, $args)->{address};

What you should not do is have getInfo() return an array ref in scalar context and an array in list context. This muddles the traditional use of scalar context as array length which will surprise the user.

Finally, I will plug my own module, Method::Signatures, because it offers a compromise for passing in array references without having to use the array ref syntax.

use Method::Signatures;

method foo(\@args) {
    print "@args";      # @args is not a copy
    push @args, 42;   # this alters the caller array
}

my @nums = (1,2,3);
Class->foo(\@nums);   # prints 1 2 3
print "@nums";        # prints 1 2 3 42

This is done through the magic of Data::Alias.

‖放下 2024-07-20 10:47:56

如果您正在读取整个较大的文件并将其切片到数组中,则还有其他 3 个可能较大的性能改进:

  1. 使用 sysread() 而不是 read() 关闭缓冲(手动警告
    关于混合)
  2. 通过评估最后一个元素来预先扩展数组 -
    节省内存分配
  3. 使用 Unpack() 快速拆分数据,如 uint16_t 图形通道数据

将数组引用传递给函数允许主程序处理简单的数组,而一次写入即忘记工作函数使用更复杂的“$ @”和箭头 ->[$II] 访问表单。 由于相当 C'ish,它可能会很快!

3 other potentially LARGE performance improvements if you are reading an entire, largish file and slicing it into an array:

  1. Turn off BUFFERING with sysread() instead of read() (manual warns
    about mixing)
  2. Pre-extend the array by valuing the last element -
    saves memory allocations
  3. Use Unpack() to swiftly split data like uint16_t graphics channel data

Passing an array ref to the function allows the main program to deal with a simple array while the write-once-and-forget worker function uses the more complicated "$@" and arrow ->[$II] access forms. Being quite C'ish, it is likely to be fast!

等你爱我 2024-07-20 10:47:56

我对 Perl 一无所知,所以这是一个与语言无关的答案。

从某种意义上说,将数组从子例程复制到调用程序中效率很低。 效率低下的原因是使用了额外的内存以及将数据从一个地方复制到另一个地方所花费的时间。 另一方面,对于除最大数组之外的所有数组,您可能根本不在乎,并且可能出于优雅、粗鲁或任何其他原因而更愿意将数组复制出来。

有效的解决方案是子程序将数组的地址传递给调用程序。 正如我所说,我对 Perl 在这方面的默认行为一无所知。 但有些语言为程序员提供了选择哪种方法的选项。

I know nothing about Perl so this is a language-neutral answer.

It is, in a sense, inefficient to copy an array from a subroutine into the calling program. The inefficiency arises in the extra memory used and the time taken to copy the data from one place to another. On the other hand, for all but the largest arrays, you might not give a damn, and might prefer to copy arrays out for elegance, cussedness or any other reason.

The efficient solution is for the subroutine to pass the calling program the address of the array. As I say, I haven't a clue about Perl's default behaviour in this respect. But some languages provide the programmer the option to choose which approach.

合久必婚 2024-07-20 10:47:55

首先返回一个数组引用怎么样?

sub getInfo {
  my $array_ref = [];
  push @$array_ref, 'foo';
  # ...
  return $array_ref;
}

my $a_ref = getInfo();
# or if you want the array expanded
my @array = @{getInfo()};

根据 dehmann 的评论进行编辑:

也可以在函数中使用普通数组并返回对其的引用。

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return \@array;
}      

What about returning an array reference in the first place?

sub getInfo {
  my $array_ref = [];
  push @$array_ref, 'foo';
  # ...
  return $array_ref;
}

my $a_ref = getInfo();
# or if you want the array expanded
my @array = @{getInfo()};

Edit according to dehmann's comment:

It's also possible to use a normal array in the function and return a reference to it.

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return \@array;
}      
你曾走过我的故事 2024-07-20 10:47:55

传递引用效率更高,但差异并不像 C++ 中那么大。 参数值本身(这意味着:数组中的值)始终通过引用传递(尽管返回值被复制)。

问题是:这重要吗? 大多数时候,事实并非如此。 如果您返回 5 个元素,请不要担心。 如果您要返回/传递 100,000 个元素,请使用引用。 仅当它是瓶颈时才对其进行优化。

Passing references is more efficient, but the difference is not as big as in C++. The argument values themselves (that means: the values in the array) are always passed by reference anyway (returned values are copied though).

Question is: does it matter? Most of the time, it doesn't. If you're returning 5 elements, don't bother about it. If you're returning/passing 100'000 elements, use references. Only optimize it if it's a bottleneck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文