Perl 中字符串最快的校验位例程是什么？

发布于 2024-11-15 15:38:49 字数 779 浏览 6 评论 0原文

给定一串数字，我必须使用 Perl 尽快对所有数字求和。

我的第一个实现使用 unpack() 解压数字，然后使用 List::Utils 的 sum() 对数字列表求和。它非常快，但是是否有更快的打包/解包方法来完成此任务？

我尝试了打包/解包组合，并对两种实现进行了基准测试。使用的CPU时间几乎相同；也许有一些我不知道的快速技巧？

这是我进行基准测试的方法：

#!/usr/bin/env perl

use 5.012;
use strict;
use List::Util qw/sum/;
use Benchmark qw/timethese/;

timethese ( 1000000, {
    list_util => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( unpack( 'AAAAAAAAA', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    perl_only => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = unpack( '%16S*', pack( 'S9', unpack( 'AAAAAAAAA', $CheckDigit ) ) );
        } while ( $CheckDigit > 9 );
    },
} );

原文

Given a string of digits, I have to sum all digits as fast as possible using Perl.

My first implementation unpacks digits with unpack(), then sums the list of digits with List::Utils' sum().
It's pretty fast but is there a faster pack/unpack recipe for this task?

I tried with a pack/unpack combination, and benchmarked the two implementations.
Used CPU time is almost the same; maybe is there some fast trick I'm not aware of?

Here is how I did my benchmark:

#!/usr/bin/env perl

use 5.012;
use strict;
use List::Util qw/sum/;
use Benchmark qw/timethese/;

timethese ( 1000000, {
    list_util => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( unpack( 'AAAAAAAAA', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    perl_only => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = unpack( '%16S*', pack( 'S9', unpack( 'AAAAAAAAA', $CheckDigit ) ) );
        } while ( $CheckDigit > 9 );
    },
} );

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

探春 2024-11-22 15:38:49

unpack 不是分割字符串的最快方法：

#!/usr/bin/env perl

use strict;
use List::Util qw/sum/;
use Benchmark qw/cmpthese/;

cmpthese ( -3, {
    list_util => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( unpack( 'AAAAAAAAA', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    unpack_star => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( unpack( '(A)*', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    re => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( $CheckDigit =~ /(.)/g );
        } while ( $CheckDigit > 9 );
    },
    split => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( split //, $CheckDigit );
        } while ( $CheckDigit > 9 );
    },
    perl_only => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = unpack( '%16S*', pack( 'S9', unpack( 'AAAAAAAAA', $CheckDigit ) ) );
        } while ( $CheckDigit > 9 );
    },
    modulo => sub {
        my $CheckDigit = "999989989";
        $CheckDigit = ($CheckDigit+0) && ($CheckDigit % 9 || 9);
    },
} );

产生：

                 Rate perl_only list_util       re unpack_star    split   modulo
perl_only     89882/s        --      -15%     -30%        -45%     -54%     -97%
list_util    105601/s       17%        --     -17%        -35%     -45%     -97%
re           127656/s       42%       21%       --        -21%     -34%     -96%
unpack_star  162308/s       81%       54%      27%          --     -16%     -95%
split        193405/s      115%       83%      52%         19%       --     -94%
modulo      3055254/s     3299%     2793%    2293%       1782%    1480%       --

因此，如果您必须将字符串分割为字符，那么 split 看起来是您的最佳选择。

但反复对数字求和几乎与取数字 mod 9 相同（如米罗德指出）。不同之处在于 $Digits % 9 生成 0 而不是 9。解决该问题的一个公式是 ($Digits-1) % 9 + 1，但是（在 Perl 中，至少）这不适用于全零情况（它产生 9 而不是 0）。 Perl 中有效的表达式是 ($Digits+0) && （$Digits % 9 || 9）。第一项处理全零情况，第二项处理正常情况，第三项将 0 更改为 9。

unpack isn't the fastest way to split a string:

#!/usr/bin/env perl

use strict;
use List::Util qw/sum/;
use Benchmark qw/cmpthese/;

cmpthese ( -3, {
    list_util => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( unpack( 'AAAAAAAAA', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    unpack_star => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( unpack( '(A)*', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    re => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( $CheckDigit =~ /(.)/g );
        } while ( $CheckDigit > 9 );
    },
    split => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = sum( split //, $CheckDigit );
        } while ( $CheckDigit > 9 );
    },
    perl_only => sub {
        my $CheckDigit = "999989989";
        do {
            $CheckDigit = unpack( '%16S*', pack( 'S9', unpack( 'AAAAAAAAA', $CheckDigit ) ) );
        } while ( $CheckDigit > 9 );
    },
    modulo => sub {
        my $CheckDigit = "999989989";
        $CheckDigit = ($CheckDigit+0) && ($CheckDigit % 9 || 9);
    },
} );

Produces:

                 Rate perl_only list_util       re unpack_star    split   modulo
perl_only     89882/s        --      -15%     -30%        -45%     -54%     -97%
list_util    105601/s       17%        --     -17%        -35%     -45%     -97%
re           127656/s       42%       21%       --        -21%     -34%     -96%
unpack_star  162308/s       81%       54%      27%          --     -16%     -95%
split        193405/s      115%       83%      52%         19%       --     -94%
modulo      3055254/s     3299%     2793%    2293%       1782%    1480%       --

So split looks like your best bet if you have to split the string into characters.

But repeatedly summing the digits is almost the same as taking the number mod 9 (as mirod pointed out). The difference is that $Digits % 9 produces 0 instead of 9. One formula that fixes that is ($Digits-1) % 9 + 1, but (in Perl at least) that doesn't work for the all-zeros case (it produces 9 instead of 0). An expression that works in Perl is ($Digits+0) && ($Digits % 9 || 9). The first term handles the all-zero case, the second the normal case, and the third changes 0 to 9.

回复收藏 0 原文

赢得她心 2024-11-22 15:38:49

是否不要太聪明地使用打包/解包并使用简单的拆分，或者在数学上稍微聪明并使用模数，这比所有其他方法都好？

#!/usr/bin/env perl

use strict;
use List::Util qw/sum/;
use Benchmark qw/timethese/;

my $D="99949596";

timethese ( 1000000, {
    naive => sub {
        my $CheckDigit= $D;
        do {
            $CheckDigit = sum( split//, $CheckDigit );
        } while ( $CheckDigit > 9 ); 
    },  
    list_util => sub {
        my $CheckDigit = $D;
        do {
            $CheckDigit = sum( unpack( 'AAAAAAAAA', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    perl_only => sub {
        my $CheckDigit = $D;
        do {
            $CheckDigit = unpack( '%16S*', pack( 'S9', unpack( 'AAAAAAAAA', $CheckDigit ) ) );
        } while ( $CheckDigit > 9 );
    },
    modulo => sub {
        my $CheckDigit = $D % 9;
    },
} );

结果：

Benchmark: timing 1000000 iterations of list_util, modulo, naive, perl_only...
 list_util:  5 wallclock secs ( 4.62 usr +  0.00 sys =  4.62 CPU) @ 216450.22/s (n=1000000)
    modulo: -1 wallclock secs ( 0.07 usr +  0.00 sys =  0.07 CPU) @ 14285714.29/s (n=1000000)
            (warning: too few iterations for a reliable count)
     naive:  3 wallclock secs ( 2.79 usr +  0.00 sys =  2.79 CPU) @ 358422.94/s (n=1000000)
 perl_only:  6 wallclock secs ( 5.18 usr +  0.00 sys =  5.18 CPU) @ 193050.19/s (n=1000000)

How about not being too clever with pack/unpack and using a simple split, or being mildly mathematically clever and using modulo, which beats the crap out of all the other methods?

#!/usr/bin/env perl

use strict;
use List::Util qw/sum/;
use Benchmark qw/timethese/;

my $D="99949596";

timethese ( 1000000, {
    naive => sub {
        my $CheckDigit= $D;
        do {
            $CheckDigit = sum( split//, $CheckDigit );
        } while ( $CheckDigit > 9 ); 
    },  
    list_util => sub {
        my $CheckDigit = $D;
        do {
            $CheckDigit = sum( unpack( 'AAAAAAAAA', $CheckDigit ) );
        } while ( $CheckDigit > 9 );
    },
    perl_only => sub {
        my $CheckDigit = $D;
        do {
            $CheckDigit = unpack( '%16S*', pack( 'S9', unpack( 'AAAAAAAAA', $CheckDigit ) ) );
        } while ( $CheckDigit > 9 );
    },
    modulo => sub {
        my $CheckDigit = $D % 9;
    },
} );

Results:

Benchmark: timing 1000000 iterations of list_util, modulo, naive, perl_only...
 list_util:  5 wallclock secs ( 4.62 usr +  0.00 sys =  4.62 CPU) @ 216450.22/s (n=1000000)
    modulo: -1 wallclock secs ( 0.07 usr +  0.00 sys =  0.07 CPU) @ 14285714.29/s (n=1000000)
            (warning: too few iterations for a reliable count)
     naive:  3 wallclock secs ( 2.79 usr +  0.00 sys =  2.79 CPU) @ 358422.94/s (n=1000000)
 perl_only:  6 wallclock secs ( 5.18 usr +  0.00 sys =  5.18 CPU) @ 193050.19/s (n=1000000)

回复收藏 0 原文

~没有更多了~