交错稀疏排序数组

发布于 2024-09-09 07:43:51 字数 687 浏览 7 评论 0原文

我有一组事件列表。事件总是按给定的顺序发生，但并非每个事件总是发生。下面是一个输入示例：

[[ do, re, fa, ti ],
 [ do, re, mi ],
 [ do, la, ti, za ],
 [ mi, fa ],
 [ re, so, za ]]

输入值没有任何固有顺序。它们实际上是诸如“创建符号链接”和“重新索引搜索”之类的消息。它们在单独的列表中排序，但无法仅查看第一个列表中的“fa”和第二个列表中的“mi”并确定哪个在另一个之前。

我希望能够接受该输入并生成所有事件的排序列表：

[ do, re, mi, fa, so, la, ti, za ]

或者更好的是，有关每个事件的一些信息，例如计数：

[ [do, 3], [re, 3], [mi, 2],
  [fa, 2], [so, 1], [la, 1],
  [ti, 1], [za, 2] ]

是否有我的名称正在做？有公认的算法吗？我用 Perl 写这个，如果这很重要的话，但伪代码就可以了。

我知道，鉴于我的示例输入，我可能无法保证“正确”的顺序。但我的真实输入有吨更多的数据点，而且我相信，通过一些技巧，它的正确率将达到 95%（这确实是我所需要的）。如果没有必要，我只是不想重新发明轮子。

原文

I've got a set of lists of events. The events always happen in a given order, but not every event always happens. Here's an example input:

[[ do, re, fa, ti ],
 [ do, re, mi ],
 [ do, la, ti, za ],
 [ mi, fa ],
 [ re, so, za ]]

The input values don't have any inherent order. They're actually messages like "creating symlinks" and "reindexing search". They're sorted in the individual list, but there's no way to look at only 'fa' in the first list and 'mi' in the second and determine which comes before the other.

I'd like to be able to take that input and generate a sorted list of all events:

[ do, re, mi, fa, so, la, ti, za ]

or better yet, some information about each event, like a count:

[ [do, 3], [re, 3], [mi, 2],
  [fa, 2], [so, 1], [la, 1],
  [ti, 1], [za, 2] ]

Is there a name for what I'm doing? Are there accepted algorithms? I'm writing this in Perl, if that matters, but pseudocode will do.

I know that given my example input, I probably can't be guaranteed of the "right" order. But my real input has tons more datapoints, and I feel confident that with some cleverness it'll be 95% right (which is really all I need). I just don't want to re-invent the wheel if I don't have to.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尐籹人 2024-09-16 07:43:51

您可以使用 tsort 推断合理的（尽管不一定是唯一的）排序顺序（称为拓扑顺序）根据您观察到的顺序。您可能有兴趣阅读 tsort 的原始用途，其结构与您的问题类似。

请注意，tsort 需要非循环图。就您的示例而言，这意味着您无法在一个序列中看到 do 后跟 re，而在另一个序列中看不到 re 后跟 do。

#! /usr/bin/perl

use warnings;
use strict;

use IPC::Open2;

sub tsort {
  my($events) = @_;

  my $pid = open2 my $out, my $in, "tsort";

  foreach my $group (@$events) {
    foreach my $i (0 .. $#$group - 1) {
      print $in map "@$group[$i,$_]\n", $i+1 .. $#$group;
    }
  }

  close $in or warn "$0: close: $!";

  chomp(my @order = <$out>);
  my %order = map +(shift @order => $_), 0 .. $#order;
  wantarray ? %order : \%order;
}

由于您将数据描述为稀疏数据，因此上面的代码为 tsort 提供了有关事件邻接矩阵的尽可能多的信息。

有了这些信息，计算直方图并对其组件进行排序就很简单了：

my $events = [ ... ];

my %order = tsort $events;

my %seen;
do { ++$seen{$_} for @$_ } for @$events;

my @counts;
foreach my $event (sort { $order{$a} <=> $order{$b} } keys %seen) {
  push @counts => [ $event, $seen{$event} ];
  print "[ $counts[-1][0], $counts[-1][1] ]\n";
}

对于您提供的问题中的输入，输出是

[ do, 3 ]
[ la, 1 ]
[ re, 3 ]
[ so, 1 ]
[ mi, 2 ]
[ fa, 2 ]
[ ti, 2 ]
[ za, 2 ]

这看起来很有趣，因为我们知道 solfège 的顺序，但是 re 和 la 在偏序由$events定义：我们只知道它们必须都在do之后。

You can use tsort to infer a reasonable—although not necessarily unique—sort order (known as a topological order) from the ordering you've observed. You may be interested in reading tsort's original use, which is similar in structure to your problem.

Note that tsort requires an acyclic graph. In terms of your example, this means you couldn't see do followed by re in one sequence and re followed by do in another.

#! /usr/bin/perl

use warnings;
use strict;

use IPC::Open2;

sub tsort {
  my($events) = @_;

  my $pid = open2 my $out, my $in, "tsort";

  foreach my $group (@$events) {
    foreach my $i (0 .. $#$group - 1) {
      print $in map "@$group[$i,$_]\n", $i+1 .. $#$group;
    }
  }

  close $in or warn "$0: close: $!";

  chomp(my @order = <$out>);
  my %order = map +(shift @order => $_), 0 .. $#order;
  wantarray ? %order : \%order;
}

Because you described the data as sparse, the code above provides tsort with as much information as possible about the events' adjacency matrix.

Having that information, computing a histogram and sorting its components is straightforward:

my $events = [ ... ];

my %order = tsort $events;

my %seen;
do { ++$seen{$_} for @$_ } for @$events;

my @counts;
foreach my $event (sort { $order{$a} <=> $order{$b} } keys %seen) {
  push @counts => [ $event, $seen{$event} ];
  print "[ $counts[-1][0], $counts[-1][1] ]\n";
}

For the input in your question you provided, the output is

[ do, 3 ]
[ la, 1 ]
[ re, 3 ]
[ so, 1 ]
[ mi, 2 ]
[ fa, 2 ]
[ ti, 2 ]
[ za, 2 ]

This looks funny because we know the order of solfège, but re and la are incomparable in the partial order defined by $events: we know only that they must both come after do.

回复收藏 0 原文

听风吹 2024-09-16 07:43:51

从理论上讲，我建议采用以下算法：

构建有向图。
对于每个输入 [ X, Y, Z ]，创建边 X->Y 和 Y->Z（如果它们尚不存在）。
对图表执行拓扑排序。
瞧！

附注
这仅假设所有事件都按特定顺序发生（始终！）。如果情况并非如此，则问题变为 NP 完全问题。

聚苯硫醚
只是为了让你有一些有用的东西： Sort::Topological （不要知道它是否真的有效，但看起来是正确的）

回复收藏 0 原文

东北女汉子 2024-09-16 07:43:51

如果您不喜欢编写太多代码，则可以使用 unix 命令行实用程序 tsort：

$ tsort -
do re
re fa
fa ti
do re
re mi
do la
la ti
ti za
mi fa
re so
so za

它是示例输入中所有对的列表。这会产生输出：

do
la
re
so
mi
fa
ti
za

这基本上就是你想要的。

If you're not into writing to much code, you could use the unix command-line utility tsort:

$ tsort -
do re
re fa
fa ti
do re
re mi
do la
la ti
ti za
mi fa
re so
so za

Which is a list of all pairs in your sample input. This produces as output:

do
la
re
so
mi
fa
ti
za

which is basically what you want.

回复收藏 0 原文

人心善变 2024-09-16 07:43:51

使用哈希值进行聚合。

my $notes= [[qw(do re fa ti)],
       [qw(do re mi)],
       [qw(do la ti za)],
       [qw(mi fa)],
       [qw(re so za)]];

my %out;
foreach my $list (@$notes)
{
  $out{$_}++ foreach @$list;
}

print "$_: $out{$_}\n" foreach sort keys %out;

如果您想要的话，%out散列

do: 3
fa: 2
la: 1
mi: 2
re: 3
so: 1
ti: 2
za: 2

可以轻松转换为列表。

my @newout;
push @newout,[$_,$out{$_}] foreach sort keys %out;

Use a hash to aggregate.

my $notes= [[qw(do re fa ti)],
       [qw(do re mi)],
       [qw(do la ti za)],
       [qw(mi fa)],
       [qw(re so za)]];

my %out;
foreach my $list (@$notes)
{
  $out{$_}++ foreach @$list;
}

print "$_: $out{$_}\n" foreach sort keys %out;

Yields

do: 3
fa: 2
la: 1
mi: 2
re: 3
so: 1
ti: 2
za: 2

The %out hash is easily converted into a list if that is what you want.

my @newout;
push @newout,[$_,$out{$_}] foreach sort keys %out;

回复收藏 0 原文

拥抱我好吗 2024-09-16 07:43:51

perl -de 0
  DB<1> @a = ( ['a','b','c'], ['c','f'], ['h'] ) 
  DB<2> map { @m{@{$_}} = @$_ } @a
  DB<3> p keys %m
chabf

我能想到的最快的捷径。无论哪种方式，你都必须至少迭代一次......

perl -de 0
  DB<1> @a = ( ['a','b','c'], ['c','f'], ['h'] ) 
  DB<2> map { @m{@{$_}} = @$_ } @a
  DB<3> p keys %m
chabf

Quickiest shortcut I can think of. Either way, you have to iterate through things at least once...

回复收藏 0 原文

彡翼 2024-09-16 07:43:51

这是合并排序的完美候选者。请访问此处的维基百科页面，了解该算法的良好表示 http://en.wikipedia.org/ wiki/Merge_sort

您所描述的实际上是合并排序的子集/小调整。您不是从未排序的数组开始，而是想要将一组已排序的数组合并在一起。只需按照维基百科页面中所述对数组和合并函数的结果调用“合并”函数，直到获得单个数组（将被排序）。

要将输出调整为您想要的方式，您需要定义一个比较函数，该函数可以在一个事件小于、等于或大于另一个事件时返回。然后，当您的合并函数发现两个相等的事件时，您可以将它们折叠成一个事件并保留该事件的计数。

回复收藏 0 原文

万水千山粽是情ミ 2024-09-16 07:43:51

粗略地说，我给它起的名字是“散列”。您将事物放入名称值对中。如果你想保持某种表面上的顺序，你必须用一个保持顺序的数组来补充哈希。这个命令对我来说就是“遭遇命令”。

use strict;
use warnings;

my $all 
    = [[ 'do', 're', 'fa', 'ti' ],
       [ 'do', 're', 'mi' ],
       [ 'do', 'la', 'ti', 'za' ],
       [ 'mi', 'fa' ],
       [ 're', 'so', 'za' ]
     ];

my ( @order, %counts );

foreach my $list ( @$all ) { 
    foreach my $item ( @$list ) { 
        my $ref = \$counts{$item}; # autovivs to an *assignable* scalar.
        push @order, $item unless $ref;
        $ref++;
    }
}

foreach my $key ( @order ) { 
    print "$key: $counts{$key}\n";
}

# do: 3
# re: 3
# fa: 2
# ti: 2
# mi: 2
# la: 1
# za: 2
# so: 1

还有其他类似的答案，但我的包含了这种巧妙的自动生存技巧。

Roughly, the name I would give it is "hashing". You are putting things into name value pairs. If you want to keep some semblance of order, you have to supplement the hash with an array that keeps order. That order is "encounter order" for me.

use strict;
use warnings;

my $all 
    = [[ 'do', 're', 'fa', 'ti' ],
       [ 'do', 're', 'mi' ],
       [ 'do', 'la', 'ti', 'za' ],
       [ 'mi', 'fa' ],
       [ 're', 'so', 'za' ]
     ];

my ( @order, %counts );

foreach my $list ( @$all ) { 
    foreach my $item ( @$list ) { 
        my $ref = \$counts{$item}; # autovivs to an *assignable* scalar.
        push @order, $item unless $ref;
        $ref++;
    }
}

foreach my $key ( @order ) { 
    print "$key: $counts{$key}\n";
}

# do: 3
# re: 3
# fa: 2
# ti: 2
# mi: 2
# la: 1
# za: 2
# so: 1

There are other answers like this one, but mine contains this neat autovivification trick.

回复收藏 0 原文

半窗疏影 2024-09-16 07:43:51

我也不太确定这会被称为什么，但我找到了一种方法来查找给定数组数组作为输入的顺序。本质上，伪代码是：

10 在所有数组中查找最早的项目
20 将其推入列表
30 从所有数组中删除该项目
40 如果还有剩余项目，则转到 10

这是一个工作原型：

#!/usr/bin/perl

use strict;

sub InList {
    my ($x, @list) = @_;
    for (@list) {
        return 1 if $x eq $_;
    }
    return 0;
}

sub Earliest {
    my @lists = @_;
    my $earliest;
    for (@lists) {
        if (@$_) {
            if (!$earliest
                || ($_->[0] ne $earliest && InList($earliest, @$_))) {

                $earliest = $_->[0];
            }
        }
    }
    return $earliest;
}

sub Remove {
    my ($x, @lists) = @_;

    for (@lists) {
        my $n = 0;
        while ($n < @$_) {
            if ($_->[$n] eq $x) {
                splice(@$_,$n,1);
            }
            else {
                $n++
            }
        }
    }
}

my $list = [
    [ 'do', 're', 'fa', 'ti' ],
    [ 'do', 're', 'mi' ],
    [ 'do', 'la', 'ti', 'za' ],
    [ 'mi', 'fa' ],
    [ 're', 'so', 'za' ]
];

my @items;

while (my $earliest = Earliest(@$list)) {
    push @items, $earliest;
    Remove($earliest, @$list);
}

print join(',', @items);

输出：

do,re,mi,fa,la,ti,so,za

I'm not really sure what this would be called either, but I figured out a way to find the order given the array of arrays as an input. Essentially the pseudo-code is:

10 Find earliest item in all arrays
20 Push that onto a list
30 Remove that item from all arrays
40 Goto 10 if there are any items left

Here's a working prototype:

#!/usr/bin/perl

use strict;

sub InList {
    my ($x, @list) = @_;
    for (@list) {
        return 1 if $x eq $_;
    }
    return 0;
}

sub Earliest {
    my @lists = @_;
    my $earliest;
    for (@lists) {
        if (@$_) {
            if (!$earliest
                || ($_->[0] ne $earliest && InList($earliest, @$_))) {

                $earliest = $_->[0];
            }
        }
    }
    return $earliest;
}

sub Remove {
    my ($x, @lists) = @_;

    for (@lists) {
        my $n = 0;
        while ($n < @$_) {
            if ($_->[$n] eq $x) {
                splice(@$_,$n,1);
            }
            else {
                $n++
            }
        }
    }
}

my $list = [
    [ 'do', 're', 'fa', 'ti' ],
    [ 'do', 're', 'mi' ],
    [ 'do', 'la', 'ti', 'za' ],
    [ 'mi', 'fa' ],
    [ 're', 'so', 'za' ]
];

my @items;

while (my $earliest = Earliest(@$list)) {
    push @items, $earliest;
    Remove($earliest, @$list);
}

print join(',', @items);

Output:

do,re,mi,fa,la,ti,so,za

回复收藏 0 原文

翻身的咸鱼 2024-09-16 07:43:51

刚刚意识到你的问题说他们没有预定的顺序，所以这可能不相关。

Perl 代码：

$list = [
    ['do', 're', 'fa', 'ti' ],
    ['do', 're', 'mi' ],
    ['do', 'la', 'ti', 'za' ],
    ['mi', 'fa' ],
    ['re', 'so', 'za' ]
];
%sid = map{($_,$n++)}qw/do re mi fa so la ti za/;

map{map{$k{$_}++}@$_}@$list;
push @$result,[$_,$k{$_}] for sort{$sid{$a}<=>$sid{$b}}keys%k;

print "[@$_]\n" for(@$result);

输出：

[do 3]
[re 3]
[mi 2]
[fa 2]
[so 1]
[la 1]
[ti 2]
[za 2]

Just realized your question said their is no predetermined order, so this may not be relevent.

Perl code:

$list = [
    ['do', 're', 'fa', 'ti' ],
    ['do', 're', 'mi' ],
    ['do', 'la', 'ti', 'za' ],
    ['mi', 'fa' ],
    ['re', 'so', 'za' ]
];
%sid = map{($_,$n++)}qw/do re mi fa so la ti za/;

map{map{$k{$_}++}@$_}@$list;
push @$result,[$_,$k{$_}] for sort{$sid{$a}<=>$sid{$b}}keys%k;

print "[@$_]\n" for(@$result);

output:

[do 3]
[re 3]
[mi 2]
[fa 2]
[so 1]
[la 1]
[ti 2]
[za 2]

回复收藏 0 原文

只有一腔孤勇 2024-09-16 07:43:51

解决方案：

这解决了提问者修改之前的原始问题。

#!/usr/local/bin/perl -w
use strict; 

   main();
    
   sub main{
      # Changed your 3-dimensional array to a 2-dimensional array
      my @old = (
                   [ 'do', 're', 'fa', 'ti' ],
                   [ 'do', 're', 'mi' ],
                   [ 'do', 'la', 'ti', 'za' ],
                   [ 'mi', 'fa' ],
                   [ 're', 'so', 'za' ]
                );
      my %new;

      foreach my $row (0.. $#old ){                           # loop through each record (row)
         foreach my $col (0..$#{$old[$row]} ){                # loop through each element (col)                    
            $new{ ${$old[$row]}[$col] }{count}++;
            push @{ $new{${$old[$row]}[$col]}{position} } , [$row,$col];
         }
      }

      foreach my $key (sort keys %new){
         print "$key : $new{$key} " , "\n";                   # notice each value is a hash that we use for properties 
      }      
   }

如何检索信息：

   local $" = ', ';                       # pretty print ($") of array in quotes
   print $new{za}{count} , "\n";          # 2    - how many there were
   print "@{$new{za}{position}[1]} \n";   # 4,2  - position of the second occurrence
                                          #        remember it starts at 0

基本上，我们在哈希中创建一个唯一的元素列表。对于每个元素，我们都有一个“属性”哈希，其中包含一个标量计数和一个位置数组。数组中元素的数量应根据元素在原始元素中出现的次数而变化。

标量属性并不是真正必要的，因为您始终可以采用 position 数组的标量来检索相同的数字。注意：如果您从数组中添加/删除元素，count 和 position 的含义将不相关。

例如： print scalar @{$new{za}{position}}; 将为您提供与 print $new{za}{count}; 相同的结果

Solution:

This solves the original question before it was modified by the asker.

#!/usr/local/bin/perl -w
use strict; 

   main();
    
   sub main{
      # Changed your 3-dimensional array to a 2-dimensional array
      my @old = (
                   [ 'do', 're', 'fa', 'ti' ],
                   [ 'do', 're', 'mi' ],
                   [ 'do', 'la', 'ti', 'za' ],
                   [ 'mi', 'fa' ],
                   [ 're', 'so', 'za' ]
                );
      my %new;

      foreach my $row (0.. $#old ){                           # loop through each record (row)
         foreach my $col (0..$#{$old[$row]} ){                # loop through each element (col)                    
            $new{ ${$old[$row]}[$col] }{count}++;
            push @{ $new{${$old[$row]}[$col]}{position} } , [$row,$col];
         }
      }

      foreach my $key (sort keys %new){
         print "$key : $new{$key} " , "\n";                   # notice each value is a hash that we use for properties 
      }      
   }

How to Retrieve Info:

   local 
quot; = ', ';                       # pretty print (quot;) of array in quotes
   print $new{za}{count} , "\n";          # 2    - how many there were
   print "@{$new{za}{position}[1]} \n";   # 4,2  - position of the second occurrence
                                          #        remember it starts at 0

Basically, we create a unique list of elements in the hash. For each of those elements we have a "property" hash, that contains a scalar count and an array for the position. The number of elements in the array should vary, based on how many occurrences of the element were in the original.

The scalar property isn't really necessary since you could always take the scalar of the position array to retrieve the same number. Note: if you ever add/remove elements from the array count and position will not be correlate in their meaning.