如何在 Perl 中创建所有大小小于 n 的子集？

发布于 2024-10-01 17:40:29 字数 1326 浏览 4 评论 0原文

我有一套。我想创建所有集合，这些集合最多从每个原始集合中获取一个元素。例如，如果我的原始集合是 ((x,y),(A),(1,2)) 那么解决方案是：

(x)
(y)
(A)
(1)
(2)
(x,A)
(x,1)
(x,2)
(y,A)
(y,A)
(y,1)
(y,2)
(A,1)
(A,2)
(x,A,1)
(x,A,2)
(y,A,1)
(y,A,2)

我使用我编写的以下代码来递归计算this：

# gets an array of arrays (aoa)
# returns an array of arrays with all subsets where zero or one element is
# taken from each array, e.g. in = [[a,b],[5],[X,Y,Z]], out =
# [[],[a],[b],[5],[X],[Y],[Z],[a,5],[b,5],[a,X],[a,Y],...,[b,5,Y],[b,5,Z]]
# note the order of elelemnts in each arry is immaterial (an array is
# considered an unordered set)
sub sets_aoa_to_subsets_aoa {
    my $aoa = shift // confess;

    if ( scalar( @{$aoa} ) == 0 ) {
        return [ [] ];
    }

    my $a           = shift @{$aoa};
    my $subsets_aoa = sets_aoa_to_subsets_aoa($aoa);
    my @new_subsets = ();
    foreach my $subset_a ( @{$subsets_aoa} ) {

        # leave subset as-is
        push @new_subsets, $subset_a;

        # add one element from $a
        foreach my $e ( @{$a} ) {
            push @new_subsets, [ $e, @{$subset_a} ];
        }
    }
    return \@new_subsets;

}

但是，我想添加对子集大小的限制。例如，如果我设置 max_size=2 那么最后四个解决方案将被忽略。我不能简单地生成所有解决方案，然后过滤掉那些太大的解决方案，因为有时我有超过 100 个集合，每个集合有 2-3 个元素，而 2^100 不是一个很好处理的数字，特别是当我只想要尺寸 5 或更小。

原文

I have a set of sets. I want to create all sets that take at most one element from each original set.
For example, if my original set of sets is ((x,y),(A),(1,2)) then the solutions are:

(x)
(y)
(A)
(1)
(2)
(x,A)
(x,1)
(x,2)
(y,A)
(y,A)
(y,1)
(y,2)
(A,1)
(A,2)
(x,A,1)
(x,A,2)
(y,A,1)
(y,A,2)

I use the following code I have written to recursively calculate this:

# gets an array of arrays (aoa)
# returns an array of arrays with all subsets where zero or one element is
# taken from each array, e.g. in = [[a,b],[5],[X,Y,Z]], out =
# [[],[a],[b],[5],[X],[Y],[Z],[a,5],[b,5],[a,X],[a,Y],...,[b,5,Y],[b,5,Z]]
# note the order of elelemnts in each arry is immaterial (an array is
# considered an unordered set)
sub sets_aoa_to_subsets_aoa {
    my $aoa = shift // confess;

    if ( scalar( @{$aoa} ) == 0 ) {
        return [ [] ];
    }

    my $a           = shift @{$aoa};
    my $subsets_aoa = sets_aoa_to_subsets_aoa($aoa);
    my @new_subsets = ();
    foreach my $subset_a ( @{$subsets_aoa} ) {

        # leave subset as-is
        push @new_subsets, $subset_a;

        # add one element from $a
        foreach my $e ( @{$a} ) {
            push @new_subsets, [ $e, @{$subset_a} ];
        }
    }
    return \@new_subsets;

}

however, I would like to add a limit on the size of the subset. For example, if I set max_size=2 then the last four solutions will be ignored. I can't simply generate all solutions then filter those who are too large, since sometimes I have more then 100 sets each with 2-3 elements, and 2^100 is not a nice number to handle, especially when I only want subsets of size 5 or less.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谈场末日恋爱 2024-10-08 17:40:29

正如我所怀疑的，正则表达式可以解决这个问题。

具体解决方案

这是针对所提出问题的具体解决方案。有80个答案。

my %seen;

"xy=a=12" =~ m{
        [^=]* (x|y)* [^=]*
    =
        [^=]* (a)*   [^=]*
    =
        [^=]* (1|2)* [^=]*

    (?{ 
         my $size = grep { length } $1, $2, $3;
         print "<$1> <$2> <$3>\n"
            if $size >= 1 && 
               $size <= 2 &&
             ! $seen{$1,$2,$3}++;
    })
    (*FAIL)
}x;

运行通过管道传输到 cat -n 的命令，您将看到 80 个答案。

当然，您需要通用且可扩展的东西，以便您可以将其应用到您的一百套情况。制定通用解决方案总是比制定特定解决方案花费更长的时间，因此我将进行概括，并在看起来不错时立即回复您。

通用解决方案

这是通用解决方案；这不是我最漂亮的作品，但它确实有效：

#!/usr/bin/perl

use 5.010;
use strict;
use warnings;

our($MIN_PICK, $MAX_PICK) = (1, 2);

our @List_of_Sets = (
    [ qw[ x y ] ],
    [ qw[ a   ] ],
    [ qw[ 1 2 ] ],
);

sub dequeue($) {
    my($leader, $body) = @_;
    $body =~ s/^\s*\Q$leader\E ?//gm;
    return $body;
}

################################

my $gunk     = " (?&gunk) ";
my $alter_rx = join("\n\t(?&post)\n" => map {
                  " $gunk ( "
                . join(" | " => map { quotemeta } @$_)
                . " ) * $gunk "
              } @List_of_Sets);
##print "ALTER_RX <\n$alter_rx\n>\n";

my $string = join(" = ", map { join(" ", @$_) } @List_of_Sets);
##print "STRING: $string\n";

my $numbers_list    = join(", " => map {  '
我有点关心你期望从中获得多少可能性。也许你应该把我上面概述的这个过程放在管道的另一端，这样你就可以从中读取你想要的内容，然后挂断电话，可以这么说，当你读完的时候。
我意识到这是一个相当不寻常的解决方案；我的代码经常是。 :)
我只是认为您不妨让正则表达式回溯的详尽排列性质为您完成工作。
也许其他人会拿出 Some::Abstruse::Module 来为您完成这项工作。你只需要权衡你更喜欢哪一个。
 编辑：提高易读性，处理重复项和额外的最小/最大标准。
 . $_        } 1 .. @List_of_Sets);
my $numbers_bracket = join(" "  => map { '<
我有点关心你期望从中获得多少可能性。也许你应该把我上面概述的这个过程放在管道的另一端，这样你就可以从中读取你想要的内容，然后挂断电话，可以这么说，当你读完的时候。
我意识到这是一个相当不寻常的解决方案；我的代码经常是。 :)
我只是认为您不妨让正则表达式回溯的详尽排列性质为您完成工作。
也许其他人会拿出 Some::Abstruse::Module 来为您完成这项工作。你只需要权衡你更喜欢哪一个。
 编辑：提高易读性，处理重复项和额外的最小/最大标准。
 . $_  . '>' } 1 .. @List_of_Sets);

my $print_statement = dequeue "|QQ|" => <<"PRINT_STATEMENT";

    |QQ|
    |QQ|    (?{
    |QQ|        no warnings qw(uninitialized);
    |QQ|        my \$size = grep { length } $numbers_list;
    |QQ|        print "$numbers_bracket\\n"
    |QQ|            if \$size >= $MIN_PICK &&
    |QQ|               \$size <= $MAX_PICK &&
    |QQ|             ! \$seen{$numbers_list}++;
    |QQ|    })
    |QQ|

PRINT_STATEMENT   
## print "PRINT $print_statement\n";

my $search_rx = do {
    use re "eval";
    my %seen;
    qr{
        ^
    $alter_rx
        $

    $print_statement

        (*FAIL)

        (?(DEFINE)
            (?<post>   =    )
            (?<gunk> [^=] * )
        )
    }x;
};
## print qq(SEARCH:\n"$string" =~ $search_rx\n);

# run, run, run!!
$string =~ $search_rx;

我有点关心你期望从中获得多少可能性。也许你应该把我上面概述的这个过程放在管道的另一端，这样你就可以从中读取你想要的内容，然后挂断电话，可以这么说，当你读完的时候。

我意识到这是一个相当不寻常的解决方案；我的代码经常是。 :)

我只是认为您不妨让正则表达式回溯的详尽排列性质为您完成工作。

也许其他人会拿出 Some::Abstruse::Module 来为您完成这项工作。你只需要权衡你更喜欢哪一个。

编辑：提高易读性，处理重复项和额外的最小/最大标准。

As I suspected, a regex works for this.

Specific Solution

Here’s the specific solution to the question precisely as posed. There are 80 answers.

my %seen;

"xy=a=12" =~ m{
        [^=]* (x|y)* [^=]*
    =
        [^=]* (a)*   [^=]*
    =
        [^=]* (1|2)* [^=]*

    (?{ 
         my $size = grep { length } $1, $2, $3;
         print "<$1> <$2> <$3>\n"
            if $size >= 1 && 
               $size <= 2 &&
             ! $seen{$1,$2,$3}++;
    })
    (*FAIL)
}x;

Run that piped to cat -n and you’ll see your 80 answers.

Of course, you’ll want something that’s generalized and extensible so that you can apply it to your situation of a hundred sets. It always takes longer to craft a general solution than a specific one, so I’ll work on that generalization and get back to you as soon as it looks pretty.

General Solution

Here’s the general solution; it’s hardly my prettiest piece of work, but it does work:

#!/usr/bin/perl

use 5.010;
use strict;
use warnings;

our($MIN_PICK, $MAX_PICK) = (1, 2);

our @List_of_Sets = (
    [ qw[ x y ] ],
    [ qw[ a   ] ],
    [ qw[ 1 2 ] ],
);

sub dequeue($) {
    my($leader, $body) = @_;
    $body =~ s/^\s*\Q$leader\E ?//gm;
    return $body;
}

################################

my $gunk     = " (?&gunk) ";
my $alter_rx = join("\n\t(?&post)\n" => map {
                  " $gunk ( "
                . join(" | " => map { quotemeta } @$_)
                . " ) * $gunk "
              } @List_of_Sets);
##print "ALTER_RX <\n$alter_rx\n>\n";

my $string = join(" = ", map { join(" ", @$_) } @List_of_Sets);
##print "STRING: $string\n";

my $numbers_list    = join(", " => map {  '
I am somewhat concerned with the number of possibilities you expect to pull out of this. It may be that you should put this process I’ve outlined above on the other end of a pipe so that you can read from it however much you want and then hang up the phone, so to speak, when you’ve had your fill.
I realize this is a rather unusual solution; my code often is. :) 
I just figure you might as well make the exhaustively permutational nature of regex backtracking do the work for you.
Perhaps others will pull out Some::Abstruse::Module to do the job for you. You’ll just have to weigh which you prefer.
EDIT: Improved legibility, handled duplicates and extra min/max criteria.
 . $_        } 1 .. @List_of_Sets);
my $numbers_bracket = join(" "  => map { '<
I am somewhat concerned with the number of possibilities you expect to pull out of this. It may be that you should put this process I’ve outlined above on the other end of a pipe so that you can read from it however much you want and then hang up the phone, so to speak, when you’ve had your fill.
I realize this is a rather unusual solution; my code often is. :) 
I just figure you might as well make the exhaustively permutational nature of regex backtracking do the work for you.
Perhaps others will pull out Some::Abstruse::Module to do the job for you. You’ll just have to weigh which you prefer.
EDIT: Improved legibility, handled duplicates and extra min/max criteria.
 . $_  . '>' } 1 .. @List_of_Sets);

my $print_statement = dequeue "|QQ|" => <<"PRINT_STATEMENT";

    |QQ|
    |QQ|    (?{
    |QQ|        no warnings qw(uninitialized);
    |QQ|        my \$size = grep { length } $numbers_list;
    |QQ|        print "$numbers_bracket\\n"
    |QQ|            if \$size >= $MIN_PICK &&
    |QQ|               \$size <= $MAX_PICK &&
    |QQ|             ! \$seen{$numbers_list}++;
    |QQ|    })
    |QQ|

PRINT_STATEMENT   
## print "PRINT $print_statement\n";

my $search_rx = do {
    use re "eval";
    my %seen;
    qr{
        ^
    $alter_rx
        $

    $print_statement

        (*FAIL)

        (?(DEFINE)
            (?<post>   =    )
            (?<gunk> [^=] * )
        )
    }x;
};
## print qq(SEARCH:\n"$string" =~ $search_rx\n);

# run, run, run!!
$string =~ $search_rx;

I am somewhat concerned with the number of possibilities you expect to pull out of this. It may be that you should put this process I’ve outlined above on the other end of a pipe so that you can read from it however much you want and then hang up the phone, so to speak, when you’ve had your fill.

I realize this is a rather unusual solution; my code often is. :)

I just figure you might as well make the exhaustively permutational nature of regex backtracking do the work for you.

Perhaps others will pull out Some::Abstruse::Module to do the job for you. You’ll just have to weigh which you prefer.

EDIT: Improved legibility, handled duplicates and extra min/max criteria.

回复收藏 0 原文

倒数 2024-10-08 17:40:29

这也是一个递归解决方案，但传递子集构建的到目前为止，这样您就可以在达到最大大小时立即停止。

#!/opt/perl/bin/perl

use strict;
use warnings;
use 5.010;

sub subsets
{
    my ($sets, $maxSize, $subset) = @_;
    $subset //= [ ];

    # If we already have $maxSize elements, we're done
    return ($subset) if @$subset == $maxSize;

    # If we have no sets left to pick from, we're done
    return ($subset) if !@$sets;

    # Consider the next set
    my @remainingSets = @$sets;
    my $nextSet = shift(@remainingSets);

    # We can choose either 0 or 1 element from this set, continue with the rest
    return (subsets(\@remainingSets, $maxSize, $subset),
            map { subsets(\@remainingSets, $maxSize, [@$subset, $_]) }
                @$nextSet);
}

my $sets = [ [qw(x y)], [qw(A)], [qw(1 2)] ];
my @subsets = subsets($sets, 2);

foreach my $subset (@subsets) {
    say '(', join(', ', @$subset), ')';
}

Also a recursive solution, but passing the subset-built-sofar along, so you can stop as soon as you reach the maximum size.

#!/opt/perl/bin/perl

use strict;
use warnings;
use 5.010;

sub subsets
{
    my ($sets, $maxSize, $subset) = @_;
    $subset //= [ ];

    # If we already have $maxSize elements, we're done
    return ($subset) if @$subset == $maxSize;

    # If we have no sets left to pick from, we're done
    return ($subset) if !@$sets;

    # Consider the next set
    my @remainingSets = @$sets;
    my $nextSet = shift(@remainingSets);

    # We can choose either 0 or 1 element from this set, continue with the rest
    return (subsets(\@remainingSets, $maxSize, $subset),
            map { subsets(\@remainingSets, $maxSize, [@$subset, $_]) }
                @$nextSet);
}

my $sets = [ [qw(x y)], [qw(A)], [qw(1 2)] ];
my @subsets = subsets($sets, 2);

foreach my $subset (@subsets) {
    say '(', join(', ', @$subset), ')';
}

回复收藏 0 原文

尝蛊 2024-10-08 17:40:29

您可以创建一个“状态变量”，它会跟踪调用sets_aoa_to_subsets_aoa的次数，然后检查您的treminal条件：

{
    my $count=0;
    sub sets_aoa_to_subsets_aoa {
        $count++;
        my ($aoa,$number_of_calls) = @_ // confess;
    if ( (scalar( @{$aoa} ) == 0) or ($count == $number_or_calls)) {
            return [ [] ];
        }
    ......
    }
    }

you could create a "state variable" which would track the number of calls to sets_aoa_to_subsets_aoa and then check for that in your treminal condition:

{
    my $count=0;
    sub sets_aoa_to_subsets_aoa {
        $count++;
        my ($aoa,$number_of_calls) = @_ // confess;
    if ( (scalar( @{$aoa} ) == 0) or ($count == $number_or_calls)) {
            return [ [] ];
        }
    ......
    }
    }

回复收藏 0 原文

海未深 2024-10-08 17:40:29

    foreach my $e ( @{$a} ) {
        push @new_subsets, [ $e, @{$subset_a} ];
    }

只需传递 $items_wanted 参数，并在 @{$subset_a} > 时跳过突出显示的代码位。 $items_wanted。由于上面的行已经生成了所有不添加其他项目的组合，因此无需任何进一步的更改即可工作。

    foreach my $e ( @{$a} ) {
        push @new_subsets, [ $e, @{$subset_a} ];
    }

simply pass down a $items_wanted paramter and skip the highlighted bit of code if @{$subset_a} > $items_wanted. Since the lines above already generate all of the combinations that don't add additional items, this will work without any further changes.

回复收藏 0 原文

~没有更多了~