使用 Perl 确定非重叠位置

发布于 10-10 17:30 字数 1057 浏览 8 评论 0原文

我有一个位置集合 - 这是数据结构的示例。

my $locations =
{
  loc_1 =>
  {
    start => 1,
    end   => 193,
  },
  loc_2 =>
  {
    start => 180,
    end   => 407,
  },
  loc_3 =>
  {
    start => 329,
    end   => 684,
  },
  loc_4 =>
  {
    start => 651,
    end   => 720,
  },
};

确定非重叠位置的每种可能组合的最佳方法是什么？这个例子的答案看起来像这样。请记住，可能有一个或多个位置，并且这些位置可能重叠也可能不重叠。

my $non_overlapping_locations =
[
  {
    loc_1 =>
    {
      start => 1,
      end   => 193,
    },
    loc_3 =>
    {
      start => 329,
      end   => 684,
    },
  },
  {
    loc_1 =>
    {
      start => 1,
      end   => 193,
    },
    loc_4 =>
    {
      start => 651,
      end   => 720,
    },
  },
  {
    loc_2 =>
    {
      start => 180,
      end   => 407,
    },
    loc_4 =>
    {
      start => 651,
      end   => 720,
    },
  }
];

更新：ysth的回复帮助我发现了措辞中的缺陷。我想我对//所有可能的//非重叠位置的组合不感兴趣，我只对不是其他解决方案子集的解决方案感兴趣。

原文

I have a collection of locations--here is an example of the data structure.

my $locations =
{
  loc_1 =>
  {
    start => 1,
    end   => 193,
  },
  loc_2 =>
  {
    start => 180,
    end   => 407,
  },
  loc_3 =>
  {
    start => 329,
    end   => 684,
  },
  loc_4 =>
  {
    start => 651,
    end   => 720,
  },
};

What is the best way to determine every possible combination of non-overlapping locations? The answer for this example would look something like this. Keep in mind there may be one or more locations, and these locations may or may not overlap.

my $non_overlapping_locations =
[
  {
    loc_1 =>
    {
      start => 1,
      end   => 193,
    },
    loc_3 =>
    {
      start => 329,
      end   => 684,
    },
  },
  {
    loc_1 =>
    {
      start => 1,
      end   => 193,
    },
    loc_4 =>
    {
      start => 651,
      end   => 720,
    },
  },
  {
    loc_2 =>
    {
      start => 180,
      end   => 407,
    },
    loc_4 =>
    {
      start => 651,
      end   => 720,
    },
  }
];

Update: ysth's response helped me see a flaw in my wording. I guess I'm not interested in //every possible// combination of non-overlapping locations, I'm only interested in the solutions that are not subsets of other solutions.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情深缘浅2024-10-17 17:30:44

use strict;
use warnings;

my $locations =
{
  loc_1 =>
  {
    start => 1,
    end   => 193,
  },
  loc_2 =>
  {
    start => 180,
    end   => 407,
  },
  loc_3 =>
  {
    start => 329,
    end   => 684,
  },
  loc_4 =>
  {
    start => 651,
    end   => 720,
  },
};
my $non_overlapping_locations = [];
my @locations = sort keys %$locations;

get_location_combinations( $locations, $non_overlapping_locations, [], @locations );

use Data::Dumper;
print Data::Dumper::Dumper($non_overlapping_locations);

sub get_location_combinations {
    my ($locations, $results, $current, @remaining) = @_;

    if ( ! @remaining ) {
        if ( not_a_subset_combination( $results, $current ) ) {
            push @$results, $current;
        }
    }
    else {
        my $next = shift @remaining;
        if (can_add_location( $locations, $current, $next )) {
            get_location_combinations( $locations, $results, [ @$current, $next ], @remaining );
        }
        get_location_combinations( $locations, $results, [ @$current ], @remaining );
    }
}

sub can_add_location {
    my ($locations, $current, $candidate) = @_;

    # not clear if == is an overlap; modify to use >=  and <= if so.
    0 == grep $locations->{$candidate}{end} > $locations->{$_}{start} && $locations->{$candidate}{start} < $locations->{$_}{end}, @$current;
}

sub not_a_subset_combination {
    my ($combinations, $candidate) = @_;

    for my $existing (@$combinations) {
        my %candidate;
        @candidate{@$candidate} = ();
        delete @candidate{@$existing};
        if ( 0 == keys %candidate ) {
            return 0;
        }
    }
    return 1;
}

一个相对简单的优化是按开始和结束对 @locations 进行排序，并预先计算并存储在每个位置的哈希值中（或仅存储在 $locations->{foo} 中），以下位置中有多少个与该位置冲突。然后在 can_add... 的情况下，在递归之前将该数字与 @remaining 拼接起来。

或者为每个位置预先计算所有后续冲突位置的哈希值，并在递归之前使用 grep 将它们全部删除。（尽管使用这种方法，让剩余的散列开始变得更有意义。）

更新：解决方案的另一种方法是构建要排除的位置树，其中叶子代表解决方案，内部节点代表仍然存在冲突的组合;顶部节点是所有位置，每个节点都有子节点，这些子节点表示删除剩余冲突位置之一，该位置（在某种任意排序方案中）大于父节点（如果有）删除的位置。

use strict;
use warnings;

my $locations =
{
  loc_1 =>
  {
    start => 1,
    end   => 193,
  },
  loc_2 =>
  {
    start => 180,
    end   => 407,
  },
  loc_3 =>
  {
    start => 329,
    end   => 684,
  },
  loc_4 =>
  {
    start => 651,
    end   => 720,
  },
};
my $non_overlapping_locations = [];
my @locations = sort keys %$locations;

get_location_combinations( $locations, $non_overlapping_locations, [], @locations );

use Data::Dumper;
print Data::Dumper::Dumper($non_overlapping_locations);

sub get_location_combinations {
    my ($locations, $results, $current, @remaining) = @_;

    if ( ! @remaining ) {
        if ( not_a_subset_combination( $results, $current ) ) {
            push @$results, $current;
        }
    }
    else {
        my $next = shift @remaining;
        if (can_add_location( $locations, $current, $next )) {
            get_location_combinations( $locations, $results, [ @$current, $next ], @remaining );
        }
        get_location_combinations( $locations, $results, [ @$current ], @remaining );
    }
}

sub can_add_location {
    my ($locations, $current, $candidate) = @_;

    # not clear if == is an overlap; modify to use >=  and <= if so.
    0 == grep $locations->{$candidate}{end} > $locations->{$_}{start} && $locations->{$candidate}{start} < $locations->{$_}{end}, @$current;
}

sub not_a_subset_combination {
    my ($combinations, $candidate) = @_;

    for my $existing (@$combinations) {
        my %candidate;
        @candidate{@$candidate} = ();
        delete @candidate{@$existing};
        if ( 0 == keys %candidate ) {
            return 0;
        }
    }
    return 1;
}

A relatively simple optimization would be to sort @locations by start and then end and pre-calculate and store in a hash (or just in $locations->{foo}) for each location how many of the following locations conflict with that location. Then in the can_add... case, splice that number off of @remaining before recursing.

Or pre-calculate for each location a hash of all following locations that conflict and strip them all out with a grep before recursing. (Though with that approach, having remaining be a hash begins to make more sense.)

Update: another approach to a solution would be to build up a tree of locations to exclude, where leaves represent solutions and inner nodes represent combinations that still have conflicts; the top node is all locations, and each node has children that represent removing one of the remaining conflicting locations that's greater (in some arbitrary ordering scheme) than the location removed by the parent node (if any).

回复收藏 0 原文

善良天后2024-10-17 17:30:44

首先，我会收集所有单独的点（每个位置的起点和终点），对它们进行排序并将它们保存在列表中。在您的情况下，这将是：

1,180,193,329,407,651,684,720.

对于该列表中的每个间隔，找出有多少段与其重叠。在您的情况下，这将是：

1, 180 -> 1
180, 193 -> 2
193, 329 -> 1
329, 407 -> 2
407, 651 -> 1
651, 684 -> 2
684, 720 -> 1

并在哪些段超过 1 个（在本例中为 3 个）处循环。因此，案例总数为 2 x 2 x 2 = 8 个解决方案（您只能选择一个解决方案中匹配多区间的一段）。

我们找到了 2, 2, 2（或 2, 3, 4）。将它们放在一个数组中并从最后一个开始。递减它直到达到 0。当达到 1 时，递减前一个数字并将第一个数字设置为初始值减 1。

假设我们已经对初始段进行了编号：（在本例中 1,2,3,4 ,5,6)。重叠的段将包含以下段[1,2]、[2,3]、[3,4]。所以我们有 3 个重叠的部分。现在我们开始一个选择/消除的递归过程：
在每一步中，我们都会查看一个具有多个片段的重叠片段。我们迭代选择，对于每个选择，我们做两件事：从每个后续重叠段中消除我们现在没有选择的段，并在有此选择可能性的每个后续重叠段中强制当前段选择。每一个不重叠的片段都将被视为一个新的选择。搜索下一个多项选择并递归。一旦我们找不到选择，我们就有部分解决方案。我们需要向其中添加不涉及任何重叠的段。打印出来。

在这种情况下，它会像这样：第一步：

we are here [1,2], [2,3], [3,4]:
  chose 1 -> // eliminate 2 from rest and force 1 (3 is a single choice so we do the same)
      [1], [3], [3] -> [1, 3] solution 
  chose 2 -> // eliminate 1 from the rest and force 2 (2 single choice so we do the same). 
      [2], [2], [4] -> [2, 4] solution

这应该可以正常工作。

现在实现这个的代码（我认为这不是最漂亮的 Perl 代码，但我真的不是一个 Perl 人）：

#!/bin/perl

use strict;
use warnings;
use 5.010;
use Data::Dumper;

my $locs = {
  loc_1 => {
    start => 1,
    end   => 193,
  },
  loc_2 => {
    start => 180,
    end   => 407,
  },
  loc_3 => {
    start => 329,
    end   => 684,
  },
  loc_4 => {
            start => 651,
    end   => 720,
  }
};

my (%starts, %ends);
map {
        my ($start, $end) = ($locs->{$_}->{start}, $locs->{$_}->{end});

        push @{ $starts{$start} }, $_;
        push @{ $ends{$end} }, $_;
} keys %$locs;

my @overlaps, my %tmp;

map {
        map { $tmp{$_} = 1 } @{$starts{$_}};
        map { delete $tmp{$_} } @{$ends{$_}};

        my @segs = keys %tmp;
        push @overlaps, \@segs if 1 < @segs
} sort (keys %starts, keys %ends);

sub parse_non_overlapping {
  my ($array,$pos)=($_[0], $_[1]);

  my @node = @{$array->[$pos]};
  foreach my $value ( @node ) {

    my @work = map { [@$_] } @$array;
    $work[$pos] = [ $value ];

    my ($removed, $forced) = ( {}, {$value => 1});
    map { $removed->{$_} = 1 if $_ ne $value } @node;

    my ($i, $new_pos) = (0, -1);
    for ( $i = $pos + 1; $i <= $#work; $i++ ) {
        $_ = $work[$i];

        #apply map
        @$_ = grep { not defined($removed->{$_}) } @$_;
        if ( $#$_ == 0 ) { $forced->{@$_[0]} = 1 }

        #apply force
            my @tmp = grep { defined $forced->{$_} } @$_;
        if ( $#tmp == 0 ) {
             map { $removed->{$_} = 1 if $tmp[0] ne $_ } @$_;
             @$_ = @tmp;
        }

        if ( $#$_ > 0 && $new_pos == -1 ) {
                $new_pos = $i;
        }

        $work[$i] = $_;
    }

    if ( $new_pos != -1 ) {
      parse_non_overlapping(\@work, $new_pos);
    } else {
      print Dumper \@work
       # @work has the partial solution minux completely non overlapping segments.
    }
  }
}    

parse_non_overlapping(\@overlaps, 0);

First i would gather all the individual points (the start and end of each location), sort them and keep them in a list. In your case that would be:

1,180,193,329,407,651,684,720.

For each interval in that list find out how many segments are overlapping it. In your case this would be:

1, 180 -> 1
180, 193 -> 2
193, 329 -> 1
329, 407 -> 2
407, 651 -> 1
651, 684 -> 2
684, 720 -> 1

and loop at what segments have more than 1 (in this case there are 3). So the total number of cases is 2 x 2 x 2 = 8 solutions (you can pick only one segment maching a multi interval in a solution).

we found 2, 2, 2 (or 2, 3, 4). Keep them in an array and start from the last. Decrement it until you reach 0. When you reach 1 decrement the previous number and set the first number at the initial value minus 1.

Let's assume we have numbered the initial segments: (in this case 1,2,3,4,5,6). The overlapping segments will have the following segments in them [1,2], [2,3], [3,4]. So we have 3 overlapping segments. Now we start a recursive process of choice/elimination:
At each step we are looking at an overlapping segment that has multiple segments. We iterate the choices and for each choice we do two things: eliminate from each subsequent overlapping segment the segments we didn't choose now, and force the current segment choice in each subsequent overlapping segments that have this choice as a possibility. Every segment that becomes a non overlapping will be treated as a new choice. Search for next multiple choice and recurse. Once we can't find a choice we have partial solution. We need to add to it the segments that aren't involved in any overlapping. Print it.

In this case it would go like this: First step:

we are here [1,2], [2,3], [3,4]:
  chose 1 -> // eliminate 2 from rest and force 1 (3 is a single choice so we do the same)
      [1], [3], [3] -> [1, 3] solution 
  chose 2 -> // eliminate 1 from the rest and force 2 (2 single choice so we do the same). 
      [2], [2], [4] -> [2, 4] solution

This should work properly.

Now the code implementing this (it's not the prettiest perl code around i assume but i'm really not a perl guy):

#!/bin/perl

use strict;
use warnings;
use 5.010;
use Data::Dumper;

my $locs = {
  loc_1 => {
    start => 1,
    end   => 193,
  },
  loc_2 => {
    start => 180,
    end   => 407,
  },
  loc_3 => {
    start => 329,
    end   => 684,
  },
  loc_4 => {
            start => 651,
    end   => 720,
  }
};

my (%starts, %ends);
map {
        my ($start, $end) = ($locs->{$_}->{start}, $locs->{$_}->{end});

        push @{ $starts{$start} }, $_;
        push @{ $ends{$end} }, $_;
} keys %$locs;

my @overlaps, my %tmp;

map {
        map { $tmp{$_} = 1 } @{$starts{$_}};
        map { delete $tmp{$_} } @{$ends{$_}};

        my @segs = keys %tmp;
        push @overlaps, \@segs if 1 < @segs
} sort (keys %starts, keys %ends);

sub parse_non_overlapping {
  my ($array,$pos)=($_[0], $_[1]);

  my @node = @{$array->[$pos]};
  foreach my $value ( @node ) {

    my @work = map { [@$_] } @$array;
    $work[$pos] = [ $value ];

    my ($removed, $forced) = ( {}, {$value => 1});
    map { $removed->{$_} = 1 if $_ ne $value } @node;

    my ($i, $new_pos) = (0, -1);
    for ( $i = $pos + 1; $i <= $#work; $i++ ) {
        $_ = $work[$i];

        #apply map
        @$_ = grep { not defined($removed->{$_}) } @$_;
        if ( $#$_ == 0 ) { $forced->{@$_[0]} = 1 }

        #apply force
            my @tmp = grep { defined $forced->{$_} } @$_;
        if ( $#tmp == 0 ) {
             map { $removed->{$_} = 1 if $tmp[0] ne $_ } @$_;
             @$_ = @tmp;
        }

        if ( $#$_ > 0 && $new_pos == -1 ) {
                $new_pos = $i;
        }

        $work[$i] = $_;
    }

    if ( $new_pos != -1 ) {
      parse_non_overlapping(\@work, $new_pos);
    } else {
      print Dumper \@work
       # @work has the partial solution minux completely non overlapping segments.
    }
  }
}    

parse_non_overlapping(\@overlaps, 0);

回复收藏 0 原文

箹锭⒈辈孓2024-10-17 17:30:44

我不是 CS 人员，所以我并不反对所有最好的算法，但我想知道是否有比以下更好的方法：

my @location_keys = keys %{$locations};
while (my $key_for_checking = (shift @location_keys) {
    foreach my $key_to_compare (@location_keys) {
        if ( do_not_overlap($locations->{$key_for_checking}, 
                            $locations->{$key_to_compare} ) {
            add_to_output($key_for_checking, $key_to_compare);
        }
    }
}

使用 do_not_overlap 和 add_to_output 合适定义的。

如果您想检查是否重叠……这非常简单。
A 和 B 不会重叠，如果：

( (A->start < B->start) && (A->end < B->start) ) ||
( (A->start > B->end)   && (A->end > B->end) )

您可能需要根据共享边界是否构成重叠进行调整。另外，如果您知道 A 和 B 是否以某种方式排序（按开始或按结束），则可以简化此过程

I'm not a CS guy, so I'm not down with all of the best algorithms, but I wonder if there's a better approach than:

my @location_keys = keys %{$locations};
while (my $key_for_checking = (shift @location_keys) {
    foreach my $key_to_compare (@location_keys) {
        if ( do_not_overlap($locations->{$key_for_checking}, 
                            $locations->{$key_to_compare} ) {
            add_to_output($key_for_checking, $key_to_compare);
        }
    }
}

With do_not_overlap and add_to_output suitable defined.

If you're wondering about checking for overlap... that's pretty straightforward.
A and B do not overlap if:

( (A->start < B->start) && (A->end < B->start) ) ||
( (A->start > B->end)   && (A->end > B->end) )

You may need to tweak depending on whether a shared boundary constitutes an overlap. Also, you can simplify this if you know whether A and B are sorted in some way (either by start or by end)

回复收藏 0 原文

可爱咩2024-10-17 17:30:44

（现实生活侵入 - 抱歉，我会写一个解释 - 并摆脱那些空的 arrayrefs，尽管这是相当微不足道的 - 稍后！）

#! /usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::MoreUtils qw(any);
use Data::Dumper;

my $locations = {
    loc_1 => {
        start => 1,
        end   => 193,
    },
    loc_2 => {
        start => 180,
        end   => 407,
    },
    loc_3 => {
        start => 329,
        end   => 684,
    },
    loc_4 => {
        start => 651,
        end   => 720,
    },
};

my @keys = keys %$locations;

my %final;

for my $key (@keys) {
    push @{ $final{$key} }, map {
        if (   $locations->{$key}->{start} >= $locations->{$_}->{start}
            && $locations->{$key}->{start} <= $locations->{$_}->{end}
            or $locations->{$key}->{end} >= $locations->{$_}->{start}
            && $locations->{$key}->{end} <= $locations->{$_}->{end} )
        {
            ();
        }
        else {
            my $return = [ sort $key, $_ ];
            if ( any { $return ~~ $_ } @{ $final{$_} }, @{ $final{$key} } ) {
                ();
            }
            else { $return; }
        }
    } grep { $_ ne $key } keys %$locations;
}

say Dumper \%final;

(Real life intrudes - Apologies, I'll write an explanation - and get ride of those empty arrayrefs, although that's fairly trivial - later!)

#! /usr/bin/perl
use strict;
use warnings;
use 5.010;
use List::MoreUtils qw(any);
use Data::Dumper;

my $locations = {
    loc_1 => {
        start => 1,
        end   => 193,
    },
    loc_2 => {
        start => 180,
        end   => 407,
    },
    loc_3 => {
        start => 329,
        end   => 684,
    },
    loc_4 => {
        start => 651,
        end   => 720,
    },
};

my @keys = keys %$locations;

my %final;

for my $key (@keys) {
    push @{ $final{$key} }, map {
        if (   $locations->{$key}->{start} >= $locations->{$_}->{start}
            && $locations->{$key}->{start} <= $locations->{$_}->{end}
            or $locations->{$key}->{end} >= $locations->{$_}->{start}
            && $locations->{$key}->{end} <= $locations->{$_}->{end} )
        {
            ();
        }
        else {
            my $return = [ sort $key, $_ ];
            if ( any { $return ~~ $_ } @{ $final{$_} }, @{ $final{$key} } ) {
                ();
            }
            else { $return; }
        }
    } grep { $_ ne $key } keys %$locations;
}

say Dumper \%final;

回复收藏 0 原文

~没有更多了~