Perl Goatse 是“秘密操作员”吗?高效的?

发布于 2024-09-28 12:34:19 字数 774 浏览 6 评论 0原文

Perl 中的“goatse 运算符”或 =()= 习惯用法导致在列表上下文中计算表达式。

一个例子是:

my $str = "5 and 4 and a 3 and 2 1 BLAST OFF!!!";
my $count =()= $str =~ /\d/g; # 5 matches...
print "There are $count numbers in your countdown...\n\n";

当我解释使用时,会发生这样的情况:

  1. $str =~ /\d/g 匹配所有数字。 g 开关和列表上下文生成这些匹配项的列表。让这成为“List Producer”的例子,在 Perl 中这可能有很多东西。
  2. =()= 会导致对空列表进行赋值,因此所有实际匹配项都会复制到空列表中。
  3. 在标量上下文中对 2. 中生成的列表的 $count 进行赋值给出了列表的计数或 5. 的结果。
  4. 空列表 =()= 的引用计数在标量分配。然后 Perl 删除列表元素的副本。

关于效率的问题是:

  1. 我解析这个的方式错了吗?
  2. 如果您有一些列表生成器并且您感兴趣的只是计数,是否有更有效的方法来做到这一点?

它对于这个简单的列表效果很好,但是如果列表有数十万个匹配项怎么办?通过这种方法,您可以生成每场比赛的完整副本,然后将其删除以进行计数。

The "goatse operator" or the =()= idiom in Perl causes an expression to be evaluated in list context.

An example is:

my $str = "5 and 4 and a 3 and 2 1 BLAST OFF!!!";
my $count =()= $str =~ /\d/g; # 5 matches...
print "There are $count numbers in your countdown...\n\n";

As I interprete the use, this is what happens:

  1. $str =~ /\d/g matches all the digits. The g switch and list context produces a list of those matches. Let this be the "List Producer" example, and in Perl this could be many things.
  2. the =()= causes an assignment to an empty list, so all the actual matches are copied to an empty list.
  3. The assignment in scalar context to $count of the list produced in 2. gives the count of the list or the result of 5.
  4. The reference count of the empty list =()= goes to zero after the scalar assignment. The copy of the list elements is then deleted by Perl.

The questions on efficiency are these:

  1. Am I wrong in how I am parsing this?
  2. If you have some List Producer and all you are interested in is the count, is there a more efficient way to do this?

It works great with this trivial list, but what if the list was hundreds of thousands of matches? With this method you are producing a full copy of every match then deleting it just to count them.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

吾性傲以野 2024-10-05 12:34:19

Perl 5 在复制列表方面很聪明。它仅复制左侧的项目数量。它之所以有效,是因为标量上下文中的列表分配会产生右侧的项目数。因此,正则表达式将创建 n 个项目,但它们不会被复制和丢弃,而只是被丢弃。您可以在下面的基准测试中看到额外副本在简单情况下产生的差异。

至于效率,迭代解决方案通常在内存和 CPU 使用方面更容易,但这必须权衡山羊秘密运算符的简洁性。以下是对各种解决方案进行基准测试的结果:

naive: 10
iterative: 10
goatse: 10

for 0 items:
               Rate iterative    goatse     naive
iterative 4365983/s        --       -7%      -12%
goatse    4711803/s        8%        --       -5%
naive     4962920/s       14%        5%        --

for 1 items:
               Rate     naive    goatse iterative
naive      749594/s        --      -32%      -69%
goatse    1103081/s       47%        --      -55%
iterative 2457599/s      228%      123%        --

for 10 items:
              Rate     naive    goatse iterative
naive      85418/s        --      -33%      -82%
goatse    127999/s       50%        --      -74%
iterative 486652/s      470%      280%        --

for 100 items:
             Rate     naive    goatse iterative
naive      9309/s        --      -31%      -83%
goatse    13524/s       45%        --      -76%
iterative 55854/s      500%      313%        --

for 1000 items:
            Rate     naive    goatse iterative
naive     1018/s        --      -31%      -82%
goatse    1478/s       45%        --      -75%
iterative 5802/s      470%      293%        --

for 10000 items:
           Rate     naive    goatse iterative
naive     101/s        --      -31%      -82%
goatse    146/s       45%        --      -75%
iterative 575/s      470%      293%        --

这是生成它的代码:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

my $s = "a" x 10;

my %subs = (
    naive => sub {
        my @matches = $s =~ /a/g;
        return scalar @matches;
    },
    goatse => sub {
        my $count =()= $s =~ /a/g;
        return $count;
    },
    iterative => sub {
        my $count = 0;
        $count++ while $s =~ /a/g;
        return $count;
    },
);

for my $sub (keys %subs) {
    print "$sub: @{[$subs{$sub}()]}\n";
}

for my $n (0, 1, 10, 100, 1_000, 10_000) {
    $s = "a" x $n;
    print "\nfor $n items:\n";
    Benchmark::cmpthese -1, \%subs;
}

Perl 5 is smart about copying lists. It only copies as many items as are on the left hand side. It works because list assignment in scalar context yields the number of items on the right hand side. So, n items will be created by the regex, but they won't be copied and discarded, just discarded. You can see the difference the extra copy makes in the naive case in the benchmark below.

As for efficiency, an iterative solution is often easier on memory and CPU usage, but this must be weighed against the succinctness of the goatse secret operator. Here are the results of benchmarking the various solutions:

naive: 10
iterative: 10
goatse: 10

for 0 items:
               Rate iterative    goatse     naive
iterative 4365983/s        --       -7%      -12%
goatse    4711803/s        8%        --       -5%
naive     4962920/s       14%        5%        --

for 1 items:
               Rate     naive    goatse iterative
naive      749594/s        --      -32%      -69%
goatse    1103081/s       47%        --      -55%
iterative 2457599/s      228%      123%        --

for 10 items:
              Rate     naive    goatse iterative
naive      85418/s        --      -33%      -82%
goatse    127999/s       50%        --      -74%
iterative 486652/s      470%      280%        --

for 100 items:
             Rate     naive    goatse iterative
naive      9309/s        --      -31%      -83%
goatse    13524/s       45%        --      -76%
iterative 55854/s      500%      313%        --

for 1000 items:
            Rate     naive    goatse iterative
naive     1018/s        --      -31%      -82%
goatse    1478/s       45%        --      -75%
iterative 5802/s      470%      293%        --

for 10000 items:
           Rate     naive    goatse iterative
naive     101/s        --      -31%      -82%
goatse    146/s       45%        --      -75%
iterative 575/s      470%      293%        --

Here is the code that generated it:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark;

my $s = "a" x 10;

my %subs = (
    naive => sub {
        my @matches = $s =~ /a/g;
        return scalar @matches;
    },
    goatse => sub {
        my $count =()= $s =~ /a/g;
        return $count;
    },
    iterative => sub {
        my $count = 0;
        $count++ while $s =~ /a/g;
        return $count;
    },
);

for my $sub (keys %subs) {
    print "$sub: @{[$subs{$sub}()]}\n";
}

for my $n (0, 1, 10, 100, 1_000, 10_000) {
    $s = "a" x $n;
    print "\nfor $n items:\n";
    Benchmark::cmpthese -1, \%subs;
}
巷雨优美回忆 2024-10-05 12:34:19

在您的特定示例中,基准测试很有用:

my $str = "5 and 4 and a 3 and 2 1 BLAST OFF!!!";

use Benchmark 'cmpthese';

cmpthese -2 => {
    goatse => sub {
        my $count =()= $str =~ /\d/g;
        $count == 5 or die
    },
    while => sub {
        my $count; 
        $count++ while $str =~ /\d/g;
        $count == 5 or die
    },
};

它返回:

           Rate goatse  while
goatse 285288/s     --   -57%
while  661659/s   132%     --

列表上下文中的 $str =~ /\d/g 正在捕获匹配的子字符串,即使不需要它。 while 示例在标量(布尔)上下文中包含正则表达式,因此正则表达式引擎只需返回 true 或 false,而不是实际的匹配项。

一般来说,如果您有一个列表生成函数并且只关心项目的数量,那么编写一个短的 count 函数会更快:

sub make_list {map {$_**2} 0 .. 1000}

sub count {scalar @_}

use Benchmark 'cmpthese';

cmpthese -2 => {
    goatse => sub {my $count =()= make_list; $count == 1001 or die},
    count  => sub {my $count = count make_list; $count == 1001 or die},
};

这给出了:

         Rate goatse  count
goatse 3889/s     --   -26%
count  5276/s    36%     --

我猜测为什么 sub 更快是因为 subroutine调用经过优化,可以传递列表而不复制它们(作为别名传递)。

In your particular example, a benchmark is useful:

my $str = "5 and 4 and a 3 and 2 1 BLAST OFF!!!";

use Benchmark 'cmpthese';

cmpthese -2 => {
    goatse => sub {
        my $count =()= $str =~ /\d/g;
        $count == 5 or die
    },
    while => sub {
        my $count; 
        $count++ while $str =~ /\d/g;
        $count == 5 or die
    },
};

which returns:

           Rate goatse  while
goatse 285288/s     --   -57%
while  661659/s   132%     --

The $str =~ /\d/g in list context is capturing the matched substring even though it is not needed. The while example has the regex in scalar (boolean) context, so the regex engine just has to return true or false, and not the actual matches.

And in general, if you have a list producing function and only care about the number of items, writing a short count function is faster:

sub make_list {map {$_**2} 0 .. 1000}

sub count {scalar @_}

use Benchmark 'cmpthese';

cmpthese -2 => {
    goatse => sub {my $count =()= make_list; $count == 1001 or die},
    count  => sub {my $count = count make_list; $count == 1001 or die},
};

which gives:

         Rate goatse  count
goatse 3889/s     --   -26%
count  5276/s    36%     --

My guess as to why the sub is faster is because subroutine calls are optimized to pass lists without copying them (passed as aliases).

小嗷兮 2024-10-05 12:34:19

如果您需要在列表上下文中运行某些内容,则必须在列表上下文中运行它。在某些情况下,就像您所介绍的那样,您可能可以使用另一种技术来解决它,但在大多数情况下您不会。

然而,在进行基准测试之前,最重要的问题是“这重要吗?”。在进行基准测试之前进行分析,只有当您没有真正的问题需要解决时才担心这些事情。 :)

如果您正在寻找终极的效率,Perl 的水平有点太高了。 :)

If you need to run something in list context you have to run it in list context. In some cases, like the one you present, you might be able to work around it with another technique, but in most cases you won't.

Before you benchmark, however, the most important question is "Does it even matter?". Profile before you benchmark, and only worry about these sorts of things when you've run out of real problems to solve. :)

If you're looking for the ultimate in efficiency though, Perl's a bit too high level. :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文