从 Perl 子例程返回整个数组是否效率低下?
我经常在 Perl 中使用一个子例程,用一些信息填充数组。 因为我也习惯在 C++ 中进行黑客攻击,所以我发现自己经常在 Perl 中这样做,使用引用:
my @array;
getInfo(\@array);
sub getInfo {
my ($arrayRef) = @_;
push @$arrayRef, "obama";
# ...
}
而不是更简单的版本:
my @array = getInfo();
sub getInfo {
my @array;
push @array, "obama";
# ...
return @array;
}
当然,原因是我不想创建数组本地在子例程中,然后在返回时复制。
是对的吗? 或者 Perl 是否会优化它?
I often have a subroutine in Perl that fills an array with some information. Since I'm also used to hacking in C++, I find myself often do it like this in Perl, using references:
my @array;
getInfo(\@array);
sub getInfo {
my ($arrayRef) = @_;
push @$arrayRef, "obama";
# ...
}
instead of the more straightforward version:
my @array = getInfo();
sub getInfo {
my @array;
push @array, "obama";
# ...
return @array;
}
The reason, of course, is that I don't want the array to be created locally in the subroutine and then copied on return.
Is that right? Or does Perl optimize that away anyway?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
如果我查看您的示例并思考您想要做什么,我习惯于以这种方式编写它:
当我需要返回大量数据时,在我看来这是直接版本。 正如您在第一个代码片段中编写的那样,不需要在
sub
之外分配数组,因为my
会为您做这件事。 无论如何,你不应该像Leon Timmermans那样过早优化建议。If I look at your example and think about what you want to do I'm used to write it in this manner:
It seems to me as straightforward version when I need return large amount of data. There is not need to allocate array outside
sub
as you written in your first code snippet becausemy
do it for you. Anyway you should not do premature optimization as Leon Timmermans suggest.回答最后的思考,不,Perl 并没有对此进行优化。 事实上,它不能,因为返回数组和返回标量是根本不同的。
如果您正在处理大量数据或者性能是主要考虑因素,那么您的 C 习惯将对您很有帮助 - 传递和返回对数据结构的引用而不是结构本身,这样就不需要复制它们。 但是,正如 Leon Timmermans 指出的那样,绝大多数时候,您处理的数据量较小,而且性能并不是什么大问题,因此请以最易读的方式进行操作。
To answer the final rumination, no, Perl does not optimize this away. It can't, really, because returning an array and returning a scalar are fundamentally different.
If you're dealing with large amounts of data or if performance is a major concern, then your C habits will serve you well - pass and return references to data structures rather than the structures themselves so that they won't need to be copied. But, as Leon Timmermans pointed out, the vast majority of the time, you're dealing with smaller amounts of data and performance isn't that big a deal, so do it in whatever way seems most readable.
这是我通常返回数组的方式。
这样,它将在标量或列表上下文中按照您想要的方式工作。
除非您知道它是代码中速度较慢的部分,否则我不会尝试优化它。 即使那样,我也会使用基准测试来查看哪个子例程实际上更快。
This is the way I would normally return an array.
This way it will work the way you want, in scalar, or list contexts.
I wouldn't try to optimize it unless you know it is a slow part of your code. Even then I would use benchmarks to see which subroutine is actually faster.
有两个考虑。 显而易见的一个问题是您的阵列将有多大? 如果它少于几十个元素,那么大小就不是一个因素(除非您对某些快速调用的函数进行微优化,但您必须首先进行一些内存分析来证明这一点)。
这是最简单的部分。 经常被忽视的第二个考虑因素是界面。 返回的数组将如何使用? 这很重要,因为整个数组取消引用在 Perl 中有点糟糕。 例如:
那太丑了。 这好多了。
它还适用于映射和 grep。
但是,如果您要挑选单个元素,则返回数组引用会很方便:
这比:
或者:
但在这一点上,您应该质疑 @info 是否真正是列表或散列。
您不应该做的是让
getInfo()
在标量上下文中返回一个数组引用,在列表上下文中返回一个数组。 这混淆了标量上下文作为数组长度的传统使用,这会让用户感到惊讶。最后,我将插入我自己的模块 Method::Signatures,因为它提供了传递数组引用而不必使用数组引用语法的折衷方案。
这是通过 Data::Alias 的魔力来完成的。
There's two considerations. The obvious one is how big is your array going to get? If it's less than a few dozen elements, then size is not a factor (unless you're micro-optimizing for some rapidly called function, but you'd have to do some memory profiling to prove that first).
That's the easy part. The oft overlooked second consideration is the interface. How is the returned array going to be used? This is important because whole array dereferencing is kinda awful in Perl. For example:
That's ugly. This is much better.
It also lends itself to mapping and grepping.
But returning an array ref can be handy if you're going to pick out individual elements:
That's simpler than:
Or:
But at that point, you should question whether @info is truly a list or a hash.
What you should not do is have
getInfo()
return an array ref in scalar context and an array in list context. This muddles the traditional use of scalar context as array length which will surprise the user.Finally, I will plug my own module, Method::Signatures, because it offers a compromise for passing in array references without having to use the array ref syntax.
This is done through the magic of Data::Alias.
如果您正在读取整个较大的文件并将其切片到数组中,则还有其他 3 个可能较大的性能改进:
关于混合)
节省内存分配
将数组引用传递给函数允许主程序处理简单的数组,而一次写入即忘记工作函数使用更复杂的“$ @”和箭头 ->[$II] 访问表单。 由于相当 C'ish,它可能会很快!
3 other potentially LARGE performance improvements if you are reading an entire, largish file and slicing it into an array:
about mixing)
saves memory allocations
Passing an array ref to the function allows the main program to deal with a simple array while the write-once-and-forget worker function uses the more complicated "$@" and arrow ->[$II] access forms. Being quite C'ish, it is likely to be fast!
我对 Perl 一无所知,所以这是一个与语言无关的答案。
从某种意义上说,将数组从子例程复制到调用程序中效率很低。 效率低下的原因是使用了额外的内存以及将数据从一个地方复制到另一个地方所花费的时间。 另一方面,对于除最大数组之外的所有数组,您可能根本不在乎,并且可能出于优雅、粗鲁或任何其他原因而更愿意将数组复制出来。
有效的解决方案是子程序将数组的地址传递给调用程序。 正如我所说,我对 Perl 在这方面的默认行为一无所知。 但有些语言为程序员提供了选择哪种方法的选项。
I know nothing about Perl so this is a language-neutral answer.
It is, in a sense, inefficient to copy an array from a subroutine into the calling program. The inefficiency arises in the extra memory used and the time taken to copy the data from one place to another. On the other hand, for all but the largest arrays, you might not give a damn, and might prefer to copy arrays out for elegance, cussedness or any other reason.
The efficient solution is for the subroutine to pass the calling program the address of the array. As I say, I haven't a clue about Perl's default behaviour in this respect. But some languages provide the programmer the option to choose which approach.
首先返回一个数组引用怎么样?
根据 dehmann 的评论进行编辑:
也可以在函数中使用普通数组并返回对其的引用。
What about returning an array reference in the first place?
Edit according to dehmann's comment:
It's also possible to use a normal array in the function and return a reference to it.
传递引用效率更高,但差异并不像 C++ 中那么大。 参数值本身(这意味着:数组中的值)始终通过引用传递(尽管返回值被复制)。
问题是:这重要吗? 大多数时候,事实并非如此。 如果您返回 5 个元素,请不要担心。 如果您要返回/传递 100,000 个元素,请使用引用。 仅当它是瓶颈时才对其进行优化。
Passing references is more efficient, but the difference is not as big as in C++. The argument values themselves (that means: the values in the array) are always passed by reference anyway (returned values are copied though).
Question is: does it matter? Most of the time, it doesn't. If you're returning 5 elements, don't bother about it. If you're returning/passing 100'000 elements, use references. Only optimize it if it's a bottleneck.