当前位置：文江博客话题详情

Perl readdir sorting

在 Perl 中对目录进行排序，考虑数字

发布于 2024-09-03 13:14:29 字数 624 浏览 4 评论 0 原文

我想我需要某种 Schwartzian Transform 才能正常工作，但我很难弄清楚，因为 perl 不是我最强的语言。

我有一个目录，其内容如下：

album1.htm
album2.htm
album3.htm
....
album99.htm
album100.htm

我试图从此目录中获取编号最高的专辑（在本例中为album100.htm）。请注意，文件上的时间戳并不是确定事物的可靠方法，因为人们会在事后添加旧的“丢失”专辑。

以前的开发人员只是使用了下面的代码片段，但是一旦目录中的专辑超过 9 个，这显然就会失败。

opendir(DIR, PATH) || print $!;
@files = readdir(DIR);
foreach $file ( sort(@files) ) {
    if ( $file =~ /album/ ) {
        $last_file = $file;
    }
}

原文

I think I need some sort of Schwartzian Transform to get this working, but I'm having trouble figuring it out, as perl isn't my strongest language.

I have a directory with contents as such:

album1.htm
album2.htm
album3.htm
....
album99.htm
album100.htm

I'm trying to get the album with the highest number from this directory (in this case, album100.htm). Note that timestamps on the files are not a reliable means of determining things, as people are adding old "missing" albums after the fact.

The previous developer simply used the code snippet below, but this clearly breaks down once there are more than 9 albums in a directory.

opendir(DIR, PATH) || print $!;
@files = readdir(DIR);
foreach $file ( sort(@files) ) {
    if ( $file =~ /album/ ) {
        $last_file = $file;
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情话难免假 2024-09-10 13:14:29

如果您只需要找到编号最高的专辑，则实际上不需要对列表进行排序，只需遍历它并跟踪最大值即可。

#!/usr/bin/perl 

use strict;
use warnings;

my $max = 0;

while ( <DATA> ) {
    my ($album) = $_ =~ m/album(\d+)/;
    $max = $album if $album > $max;
}

print "album$max.htm";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm

If you just need to find the album with the highest number, you don't really need to sort the list, just run through it and keep track of the maximum.

#!/usr/bin/perl 

use strict;
use warnings;

my $max = 0;

while ( <DATA> ) {
    my ($album) = $_ =~ m/album(\d+)/;
    $max = $album if $album > $max;
}

print "album$max.htm";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm

回复收藏 0 原文

地狱即天堂 2024-09-10 13:14:29

要找到最高的数字，请尝试自定义排序...

sub sort_files {
    (my $num_a = $a) =~ s/^album(\d+)\.htm$/$1/;
    (my $num_b = $b) =~ s/^album(\d+)\.htm$/$1/;
    return $num_a <=> $num_b;
}

my @sorted = sort \&sort_files @files;
my $last = pop @sorted;

另外，请查看 File::Next 模块。它会让您只挑选以“专辑”一词开头的文件。我发现它比 readdir 更容易一些。

To find the highest number, try a custom sort...

sub sort_files {
    (my $num_a = $a) =~ s/^album(\d+)\.htm$/$1/;
    (my $num_b = $b) =~ s/^album(\d+)\.htm$/$1/;
    return $num_a <=> $num_b;
}

my @sorted = sort \&sort_files @files;
my $last = pop @sorted;

Also, take a look at the File::Next module. It will let you pick out just the files that begin with the word "album". I find it a little easier than readdir.

回复收藏 0 原文

染墨丶若流云 2024-09-10 13:14:29

您遇到困难的原因是运算符，<=> 是数字比较，cmp 是默认，它是字符串比较。

$ perl -E'say for sort qw/01 1 02 200/';
01
02
1
200

通过稍加修改，我们得到的结果更接近正确：

$ perl -E'say for sort { $a <=> $b } qw/01 1 02 200/';
01
1
02
200

但是，在您的情况下，您需要删除非数字。

$ perl -E'say for sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2  } qw/01 1 02 200/';
01
1
02
200

这是更漂亮的：

sort {
  my $s1 = $a =~ m/(\d+)/;
  my $s2 = $b =~ /(\d+)/;
  $s1 <=> $s2
}

这并不完美，但它应该让您对排序问题有一个很好的了解。

哦，作为后续，Shcwartzian 变换解决了一个不同的问题：它使您不必在搜索算法。它的代价是必须缓存结果（这并不意外），从而产生内存成本。本质上，您所做的是将问题的输入映射到输出（通常在数组中）[$input, $output]，然后对输出进行排序$a-> [1] <=> $b->[1]。现在您的内容已经排序完毕，您可以重新映射以获得原始输入$_->[0]。

map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, fn($_) ]
, qw/input list here/
;

它很酷，因为它非常紧凑，同时又非常高效。

The reason why you're encountering difficulties is the operator, <=> is the numeric comparison, cmp is the default and it is string comparison.

$ perl -E'say for sort qw/01 1 02 200/';
01
02
1
200

With a slight modification we get something much closer to correct:

$ perl -E'say for sort { $a <=> $b } qw/01 1 02 200/';
01
1
02
200

However, in your case you need to remove the non digits.

$ perl -E'say for sort { my $s1 = $a =~ m/(\d+)/; my $s2 = $b =~ /(\d+)/; $s1 <=> $s2  } qw/01 1 02 200/';
01
1
02
200

Here is it more pretty:

sort {
  my $s1 = $a =~ m/(\d+)/;
  my $s2 = $b =~ /(\d+)/;
  $s1 <=> $s2
}

This isn't flawless, but it should give you a good idea of your issue with sort.

Oh, and as a follow up, the Shcwartzian Transform solves a different problem: it stops you from having to run a complex task (unlike the one you're needing -- a regex) multiple times in the search algorithm. It comes at a memory cost of having to cache the results (not to be unexpected). Essentially, what you do is map the input of the problem, to the output (typically in an array) [$input, $output] then you sort on the outputs $a->[1] <=> $b->[1]. With your stuff now sorted you map back over to get your original inputs $_->[0].

map $_->[0],
sort { $a->[1] <=> $b->[1] }
map [ $_, fn($_) ]
, qw/input list here/
;

It is cool because it is so compact while being so efficient.

回复收藏 0 原文

早乙女 2024-09-10 13:14:29

在这里，使用 Schwartzian 变换：

my @files = <DATA>;

print join '',
    map  { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map  { [ m/album(\d+)/, $_ ] }
    @files;


 __DATA__
album12.htm
album1.htm
album2.htm
album10.htm

Here you go, using Schwartzian Transform:

my @files = <DATA>;

print join '',
    map  { $_->[1] }
    sort { $a->[0] <=> $b->[0] }
    map  { [ m/album(\d+)/, $_ ] }
    @files;


 __DATA__
album12.htm
album1.htm
album2.htm
album10.htm

回复收藏 0 原文

无戏配角 2024-09-10 13:14:29

这是使用 reduce 的替代解决方案：

use strict;
use warnings;
use List::Util 'reduce';

my $max = reduce {
    my ($aval, $bval) = ($a =~ m/album(\d+)/, $b =~ m/album(\d+)/);
    $aval > $bval ? $a : $b
} <DATA>;
print "max album is $max\n";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm

Here's an alternative solution using reduce:

use strict;
use warnings;
use List::Util 'reduce';

my $max = reduce {
    my ($aval, $bval) = ($a =~ m/album(\d+)/, $b =~ m/album(\d+)/);
    $aval > $bval ? $a : $b
} <DATA>;
print "max album is $max\n";

__DATA__
album1.htm
album100.htm
album2.htm
album3.htm
album99.htm

回复收藏 0 原文

赏烟花じ飞满天 2024-09-10 13:14:29

这是一个通用的解决方案：

my @sorted_list
    = map  { $_->[0] } # we stored it at the head of the list, so we can pull it out
      sort {
          # first test a normalized version
          my $v = $a->[1] cmp $b->[1];
          return $v if $v;

          my $lim = @$a > @$b ? @$a : @$b;

          # we alternate between ascii sections and numeric
          for ( my $i = 2; $i < $lim; $i++ ) {
              $v  =  ( $a->[$i] || '' ) cmp ( $b->[$i] || '' );
              return $v if $v;

              $i++;
              $v = ( $a->[$i] || 0 ) <=> ( $b->[$i] || 0 );
              return $v if $v;
          }
          return 0;

      }
      map {
          # split on digits and retain captures in place.
          my @parts = split /(\d+)/;
          my $nstr  = join( '', map { m/\D/ ? $_ : '0' x length() } @parts );
          [ $_, $nstr, @parts ];
      } @directory_names
      ;

Here's a generic solution:

my @sorted_list
    = map  { $_->[0] } # we stored it at the head of the list, so we can pull it out
      sort {
          # first test a normalized version
          my $v = $a->[1] cmp $b->[1];
          return $v if $v;

          my $lim = @$a > @$b ? @$a : @$b;

          # we alternate between ascii sections and numeric
          for ( my $i = 2; $i < $lim; $i++ ) {
              $v  =  ( $a->[$i] || '' ) cmp ( $b->[$i] || '' );
              return $v if $v;

              $i++;
              $v = ( $a->[$i] || 0 ) <=> ( $b->[$i] || 0 );
              return $v if $v;
          }
          return 0;

      }
      map {
          # split on digits and retain captures in place.
          my @parts = split /(\d+)/;
          my $nstr  = join( '', map { m/\D/ ? $_ : '0' x length() } @parts );
          [ $_, $nstr, @parts ];
      } @directory_names
      ;

回复收藏 0 原文

~没有更多了~