如何在 Perl 中以 dd:mm:yyyy hh24:mi:ss 格式按降序对时间戳进行排序？

发布于 2024-12-08 21:36:14 字数 4078 浏览 1 评论 0原文

我必须对哈希键进行排序，该哈希键是时间戳 (dd:mm:yyyy hh24:mi:ss) 按降序排列。

sort { $b <=> $a } keys %time_spercent

这种方式并没有让我做我想做的事。相反，这最终会首先按较高的小时和分钟排序，即使日期并非如此。例如，这就是我在进行我提到的排序时得到的结果。

21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

相反，我希望它们按照日期和时间的顺序排列。

05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:48:37
21:01:2011 16:51:09
21:01:2011 16:49:54

任何关于如何做到这一点的建议都将不胜感激。

Update

foreach my $status_date( 
     map  { $_->[0] }
     sort { $b->[1] cmp $a->[1] }
     map  { [$_, sorting_desc($_)] } keys % {$com_sam->{ $s1 } } )

和

sub sorting_desc {
    $_ = shift;
    if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)/) {
        return "$2:$1:$3:$4:$5:$6";
    }
}

是排序的子程序。

我也尝试过

foreach my $status_date( 
    map  { $_->[0] }
    sort { $b->[1] cmp $a->[1] }
    map { [$_, (split/[:\s][1]] } keys % {$com_sam->{ $s1 } } )

，但没有达到预期的结果。

我得到的只是：

WGA_PD7124a WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192654  01:07:2011 16:13:55
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192655  01:07:2011 16:11:23
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Male(Unknown)   192656  01:07:2011 11:04:26
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184557  04:05:2011 17:34:27
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184557  04:05:2011 17:34:27
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174878  15:02:2011 09:24:31
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174970  15:02:2011 09:21:19
3074    3074    87(10)  87(10)  87      100.00  109     Female(Unknown) 174860  15:02:2011 09:16:32
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173382  09:02:2011 09:54:48
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173284  09:02:2011 09:51:02
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173382  09:02:2011 09:54:48
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173284  09:02:2011 09:51:02
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200946  25:08:2011 10:17:26
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200946  25:08:2011 10:17:26
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179502  23:03:2011 10:03:23
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179470  23:03:2011 10:02:30

原文

I have to sort my hash keys which is a timestamp (dd:mm:yyyy hh24:mi:ss) in descending order.

sort { $b <=> $a } keys %time_spercent

this way is not getting me what I intend to do. Rather this ends in sorting with the higher hours and minutes first even though the date is not so. For example, this is how I get when I do the sorting as I have mentioned.

21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

Rather I want them in this order arranged both by date as well as in time.

05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:48:37
21:01:2011 16:51:09
21:01:2011 16:49:54

Any pointers are suggestion on how this could be done would be gratefully received.

Update

foreach my $status_date( 
     map  { $_->[0] }
     sort { $b->[1] cmp $a->[1] }
     map  { [$_, sorting_desc($_)] } keys % {$com_sam->{ $s1 } } )

and

sub sorting_desc {
    $_ = shift;
    if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)/) {
        return "$2:$1:$3:$4:$5:$6";
    }
}

is the subroutine for sorting.

I also tried

foreach my $status_date( 
    map  { $_->[0] }
    sort { $b->[1] cmp $a->[1] }
    map { [$_, (split/[:\s][1]] } keys % {$com_sam->{ $s1 } } )

but not the expected results.

All I get is:

WGA_PD7124a WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192654  01:07:2011 16:13:55
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192655  01:07:2011 16:11:23
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Male(Unknown)   192656  01:07:2011 11:04:26
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184557  04:05:2011 17:34:27
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184557  04:05:2011 17:34:27
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174878  15:02:2011 09:24:31
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174970  15:02:2011 09:21:19
3074    3074    87(10)  87(10)  87      100.00  109     Female(Unknown) 174860  15:02:2011 09:16:32
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173382  09:02:2011 09:54:48
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173284  09:02:2011 09:51:02
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173382  09:02:2011 09:54:48
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173284  09:02:2011 09:51:02
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200946  25:08:2011 10:17:26
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200946  25:08:2011 10:17:26
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179502  23:03:2011 10:03:23
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179470  23:03:2011 10:02:30

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苦笑流年记忆 2024-12-15 21:36:14

您可以将格式更改为yyyy:mm:dd hh24:mi:ss吗？那时您就会有一个自然排序。基本上，将所有内容按重要性降序排列对机器更友好:)

编辑：然后只需使用字符串比较进行排序，因为它自然会以正确的方式排序。

回复收藏 0 原文

梦回旧景 2024-12-15 21:36:14

从你的问题来看，我不清楚你真正想要如何排序以及如何生成示例。我无法在您预期的排序顺序示例中检测到任何顺序。
一个可能的解决方案在底部。

让我澄清一下：

给定一个包含以下内容的文本文件“ts”（您的示例）：

> cat ts
21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

标准排序会产生以下输出：

> perl -e '@a = <>; print sort @a' ts
05:04:2011 11:48:37
05:04:2011 11:48:37
05:04:2011 11:51:13
05:04:2011 11:51:13
21:01:2011 16:49:54
21:01:2011 16:51:09
26:01:2011 11:01:40
26:01:2011 11:02:55

而您建议的数字降序排序会产生以下顺序：

> perl -e '@a = <>; print sort { $b <=> $a } @a' ts
26:01:2011 11:02:55
26:01:2011 11:01:40
21:01:2011 16:51:09
21:01:2011 16:49:54
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

澄清数字排序：太空船运算符 < ;=>强制对其两个操作数进行数值解释。因此，字符串 $a 和 $b（每个包含日期和时间）被解释为数字。为此，本示例中的 perl 提取日期并在第一个“:”处停止。这就是为什么时间，甚至月份和年份都被完全忽略，我们只按降序排列月份中的日期。

最后，如果您确实想对日期、时间进行反向排序，并且需要保留格式，您可以使用以下代码：

> perl -e '@a = <>; sub dmyt2ymdt { my $dmyt=shift; $ymdt=join(q(), (split(/[:\s]+/,$dmyt))[2,1,0,3,4,5])}   print sort { dmyt2ymdt($b) <=> dmyt2ymdt($a) } @a' ts
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37
26:01:2011 11:02:55
26:01:2011 11:01:40
21:01:2011 16:51:09
21:01:2011 16:49:54

这是一个更好的格式化版本（我没有测试）：

sub dmyt2ymdt {
  my $dmyt = shift;
  my ($day, $mon, $year, $h, $m, $s) = split(/[:\s]+/, $dmyt);
  return join('', $year, $mon, $day, $h, $m, $s);
}

然后，此排序函数

sort { dmyt2ymdt($b) <=> dmyt2ymdt($a) }

会多次调用上面的帮助程序。在您的示例中，列表中有 8 个条目需要排序，并且该函数被调用 24 次。所以它的性能效率不高。但对于几百甚至几千个条目的小列表来说，这可能对你来说没问题。
如果您有很大的列表，您应该只进行一次格式转换，但这仍然会消耗内存。因此，对于大型列表，您需要在内存与执行时间之间进行权衡，这是常见的情况。

如果性能是优化标准，您可以按照其他答案和评论中的评论和显示进行动态转换，如下所示：

sort { $b <=> $a }  map { dmyt2ymdt($_) } @a

..对于我上面的示例。现在，每个元素只需进行一次转换。尽管如此，我们还是必须在内存中保存一个临时列表。我不太确定 perl 可以如何优化上述结构。人们可能认为以下内容更容易优化：

reverse sort map { dmyt2ymdt($_) } @a

这也适用于测试集。排序默认返回字符串比较，这与相同长度的字符串的数字比较相同，在其他字符串具有数字的位置不使用空格。

希望这有帮助！

From your question it is unclear to me how you really want to sort and how you produced the examples. I cannot detect any order in the example of your expected sort order.
A likely solution is at the bottom.

Let me clarify:

Given a textfile "ts" with the following content (your example):

> cat ts
21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

A standard sort produces the following output:

> perl -e '@a = <>; print sort @a' ts
05:04:2011 11:48:37
05:04:2011 11:48:37
05:04:2011 11:51:13
05:04:2011 11:51:13
21:01:2011 16:49:54
21:01:2011 16:51:09
26:01:2011 11:01:40
26:01:2011 11:02:55

While the numerically descending sort you proposed produces the following order:

> perl -e '@a = <>; print sort { $b <=> $a } @a' ts
26:01:2011 11:02:55
26:01:2011 11:01:40
21:01:2011 16:51:09
21:01:2011 16:49:54
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

To clarify on the numerical sort: The spaceship operator <=> enforces numerical interpretation of its two operands. So the strings $a and $b, each containing the date and time, are interpreted as if they were numbers. To do this perl in this example extracts the date and stops at the first ':'. That's why the time, and even the month and year are completely ignored and we're only sorting for the day of the month in descending order.

Finally, if you really want to reverse sort for date, then time and need to keep the format you can use this code:

> perl -e '@a = <>; sub dmyt2ymdt { my $dmyt=shift; $ymdt=join(q(), (split(/[:\s]+/,$dmyt))[2,1,0,3,4,5])}   print sort { dmyt2ymdt($b) <=> dmyt2ymdt($a) } @a' ts
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37
26:01:2011 11:02:55
26:01:2011 11:01:40
21:01:2011 16:51:09
21:01:2011 16:49:54

Here's a nicer formatted version (which I did not test):

sub dmyt2ymdt {
  my $dmyt = shift;
  my ($day, $mon, $year, $h, $m, $s) = split(/[:\s]+/, $dmyt);
  return join('', $year, $mon, $day, $h, $m, $s);
}

This sort function

sort { dmyt2ymdt($b) <=> dmyt2ymdt($a) }

then calls the above helper quite a lot. In your example we have 8 entries in the list to sort and the function gets called 24 times. So it is not performance efficient. But for small lists up to a couple hundred or even thousand entries it may be alright for you.
If you have large lists, you should do the format conversion only once, but it still costs memory. So for large lists, you need to tradeoff memory versus execution time, as is often the case.

IF performance is the optimization criteria, you could do the transformation on the fly as has been commented and shown in other answers and comments like this:

sort { $b <=> $a }  map { dmyt2ymdt($_) } @a

..for my example above. Now you do the conversion only once per element. Still, we have to hold a temporary list in memory. I'm not exactly sure how well perl could optimize the above construct. One may think that the following is easier to optimize:

reverse sort map { dmyt2ymdt($_) } @a

which would work for the testset, too. The sort defaults back to the string comparison which is the same as a numerical comparison for strings of identical length which do not use spaces in those locations where other strings have digits.

Hope this helps!

回复收藏 0 原文

半城柳色半声笛 2024-12-15 21:36:14

Jon Skeet的答案更好！（即，如果可以的话，只需将时间戳更改为 ISO 8601 格式。）

但是如果您无法更改格式，您可以执行以下操作：（

#!/usr/bin/perl -w
use strict;

my %h;

while(<DATA>) {
    chomp;
    $h{$_}++;
}

sub iso_8601 {
    $_ = shift;
    if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)/) {
        return "$3:$2:$1:$4:$5:$6";
    }    
}

foreach my $key (sort {iso_8601($a) cmp iso_8601($b)} keys %h) { 
    print "$key -- $h{$key}\n";
}

__DATA__
21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

重复的时间戳我假设您有自己的逻辑要处理。通过对它们进行散列，对重复项进行计数，我只是打印它们的计数.. .)

结果：

21:01:2011 16:49:54 -- 1
21:01:2011 16:51:09 -- 1
26:01:2011 11:01:40 -- 1
26:01:2011 11:02:55 -- 1
05:04:2011 11:48:37 -- 2
05:04:2011 11:51:13 -- 2

编辑

好的，如果您关心效率，(sort {iso_8601($a) cmp iso_8601($b)} keys %h) 不是 < em>最好，因为每个哈希元素都会多次调用 iso_8601() 函数。

对于“Schwartzian Transform”的形式，您可以执行

print join("\n",
    map { $_->[0].' -- '.$h{$_->[0]} }
    sort { $a->[1] cmp $b->[1] }
    map {[$_,iso_8601($_)]} 
        keys %h);

以下操作：与上面相同的输出。然后，每个哈希键仅调用 iso_8601() 一次，而不是多次...

要剖析它（它从右到左，从下到上）：

keys %h                         # list of all the keys of the hash
map {[$_,iso_8601($_)]}         # create anon array with 2 elements:
                                # original stamp and ISO 8601 stamp
sort { $a->[1] cmp $b->[1] }    # list sorted on the ISO 8601 stamp
map { $_->[0].' -- '.$h{$_->[0]} }  # a list of strings with original stamp
                                    # and hash count
join("\n",                      # join the list into a string with a "\n"

编辑 2

我很难理解你想要什么。试试这个：

#!/usr/bin/perl -w
use strict;

my %h;
my $i=0;

while(<DATA>) {
    chomp;
    $h{$_}++;
}

sub iso_8601 {
    $_ = shift;
    if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)$/) {
        $i++;
        return "$3-$2-$1 $4:$5:$6";
    }    
}

foreach my $key (sort {iso_8601($b) cmp iso_8601($a)} keys %h) { 
    print iso_8601($key).":\t\t"."$key -- $h{$key}\n";
}

print "\n";

输出：

YYYY-MM-DD HH:MM:SS                 your record... 
2011-09-01 10:48:18:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200943  01:09:2011 10:48:18 -- 1
2011-09-01 10:48:18:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200943  01:09:2011 10:48:18 -- 1
2011-08-25 10:20:16:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200944  25:08:2011 10:20:16 -- 1
2011-08-25 10:20:16:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200944  25:08:2011 10:20:16 -- 1
2011-08-25 10:19:05:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200945  25:08:2011 10:19:05 -- 1
2011-08-25 10:19:05:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200945  25:08:2011 10:19:05 -- 1
2011-08-25 10:17:26:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200946  25:08:2011 10:17:26 -- 1
2011-08-25 10:17:26:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200946  25:08:2011 10:17:26 -- 1
2011-07-01 16:13:55:        WGA_PD7124a WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192654  01:07:2011 16:13:55 -- 1
2011-07-01 16:11:23:        WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192655  01:07:2011 16:11:23 -- 1
2011-07-01 11:04:26:        WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Male(Unknown)   192656  01:07:2011 11:04:26 -- 1
2011-05-04 17:35:52:        WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184558  04:05:2011 17:35:52 -- 1
2011-05-04 17:35:52:        WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184558  04:05:2011 17:35:52 -- 1
2011-05-04 17:34:27:        WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184557  04:05:2011 17:34:27 -- 1
2011-05-04 17:34:27:        WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184557  04:05:2011 17:34:27 -- 1
2011-03-23 10:03:23:        PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179502  23:03:2011 10:03:23 -- 1
2011-03-23 10:02:30:        PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179470  23:03:2011 10:02:30 -- 1
2011-02-15 09:24:31:        3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174878  15:02:2011 09:24:31 -- 1
2011-02-15 09:21:19:        3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174970  15:02:2011 09:21:19 -- 1
2011-02-15 09:16:32:        3074    3074    87(10)  87(10)  87      100.00  109     Female(Unknown) 174860  15:02:2011 09:16:32 -- 1
2011-02-09 09:54:48:        CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173382  09:02:2011 09:54:48 -- 1
2011-02-09 09:54:48:        3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173382  09:02:2011 09:54:48 -- 1
2011-02-09 09:51:02:        3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173284  09:02:2011 09:51:02 -- 1
2011-02-09 09:51:02:        CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173284  09:02:2011 09:51:02 -- 1

这是你的想法吗？它解析行尾的时间戳并按降序对这些记录进行排序。这有什么问题吗？

Jon Skeet's answer is better! (i.e., just change your time stamp, if you can, to the ISO 8601 format.)

But if you can't change the format, you could do something like:

#!/usr/bin/perl -w
use strict;

my %h;

while(<DATA>) {
    chomp;
    $h{$_}++;
}

sub iso_8601 {
    $_ = shift;
    if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)/) {
        return "$3:$2:$1:$4:$5:$6";
    }    
}

foreach my $key (sort {iso_8601($a) cmp iso_8601($b)} keys %h) { 
    print "$key -- $h{$key}\n";
}

__DATA__
21:01:2011 16:51:09
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
05:04:2011 11:48:37

(The duplicate time stamps I assume you have your own logic to deal with. By hashing them, the duplicates are counted, and I am just printing their count...)

Result:

21:01:2011 16:49:54 -- 1
21:01:2011 16:51:09 -- 1
26:01:2011 11:01:40 -- 1
26:01:2011 11:02:55 -- 1
05:04:2011 11:48:37 -- 2
05:04:2011 11:51:13 -- 2

Edit

OK, if you are concerned about efficiency, the (sort {iso_8601($a) cmp iso_8601($b)} keys %h) is not the best since the iso_8601() function is called many times per hash element.

For a form of "Schwartzian Transform" you can do:

print join("\n",
    map { $_->[0].' -- '.$h{$_->[0]} }
    sort { $a->[1] cmp $b->[1] }
    map {[$_,iso_8601($_)]} 
        keys %h);

Which will produce the same output as above. Then you are calling iso_8601() only once per hash key, not multiple times...

To dissect that (it goes right to left, bottom to top):

keys %h                         # list of all the keys of the hash
map {[$_,iso_8601($_)]}         # create anon array with 2 elements:
                                # original stamp and ISO 8601 stamp
sort { $a->[1] cmp $b->[1] }    # list sorted on the ISO 8601 stamp
map { $_->[0].' -- '.$h{$_->[0]} }  # a list of strings with original stamp
                                    # and hash count
join("\n",                      # join the list into a string with a "\n"

EDIT 2

I am having a hard time understanding what you want. Try this:

#!/usr/bin/perl -w
use strict;

my %h;
my $i=0;

while(<DATA>) {
    chomp;
    $h{$_}++;
}

sub iso_8601 {
    $_ = shift;
    if (/(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)$/) {
        $i++;
        return "$3-$2-$1 $4:$5:$6";
    }    
}

foreach my $key (sort {iso_8601($b) cmp iso_8601($a)} keys %h) { 
    print iso_8601($key).":\t\t"."$key -- $h{$key}\n";
}

print "\n";

Output:

YYYY-MM-DD HH:MM:SS                 your record... 
2011-09-01 10:48:18:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200943  01:09:2011 10:48:18 -- 1
2011-09-01 10:48:18:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200943  01:09:2011 10:48:18 -- 1
2011-08-25 10:20:16:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200944  25:08:2011 10:20:16 -- 1
2011-08-25 10:20:16:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200944  25:08:2011 10:20:16 -- 1
2011-08-25 10:19:05:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200945  25:08:2011 10:19:05 -- 1
2011-08-25 10:19:05:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200945  25:08:2011 10:19:05 -- 1
2011-08-25 10:17:26:        MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200946  25:08:2011 10:17:26 -- 1
2011-08-25 10:17:26:        MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200946  25:08:2011 10:17:26 -- 1
2011-07-01 16:13:55:        WGA_PD7124a WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192654  01:07:2011 16:13:55 -- 1
2011-07-01 16:11:23:        WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192655  01:07:2011 16:11:23 -- 1
2011-07-01 11:04:26:        WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Male(Unknown)   192656  01:07:2011 11:04:26 -- 1
2011-05-04 17:35:52:        WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184558  04:05:2011 17:35:52 -- 1
2011-05-04 17:35:52:        WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184558  04:05:2011 17:35:52 -- 1
2011-05-04 17:34:27:        WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184557  04:05:2011 17:34:27 -- 1
2011-05-04 17:34:27:        WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184557  04:05:2011 17:34:27 -- 1
2011-03-23 10:03:23:        PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179502  23:03:2011 10:03:23 -- 1
2011-03-23 10:02:30:        PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179470  23:03:2011 10:02:30 -- 1
2011-02-15 09:24:31:        3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174878  15:02:2011 09:24:31 -- 1
2011-02-15 09:21:19:        3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174970  15:02:2011 09:21:19 -- 1
2011-02-15 09:16:32:        3074    3074    87(10)  87(10)  87      100.00  109     Female(Unknown) 174860  15:02:2011 09:16:32 -- 1
2011-02-09 09:54:48:        CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173382  09:02:2011 09:54:48 -- 1
2011-02-09 09:54:48:        3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173382  09:02:2011 09:54:48 -- 1
2011-02-09 09:51:02:        3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173284  09:02:2011 09:51:02 -- 1
2011-02-09 09:51:02:        CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173284  09:02:2011 09:51:02 -- 1

Is this what you are thinking? It parses the time stamp at the end of line and sorts those records in descending order. What is the issue with this?

回复收藏 0 原文

旧伤还要旧人安 2024-12-15 21:36:14

我前段时间遇到了同样的问题，当我按照 Jon Skeet 提出的方式对列表进行排序时，我解决了转换格式的问题，这是我的代码：

my @source = <DATA>;
my @data =  sort {$a<=>$b} map { m!(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)!; "$3$2$1$4$5$6";} @source;
foreach ( @data ) {
    s!(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})!$3:$2:$1 $4:$5:$6!;
    print $_, "\n";
}

__DATA__
05:04:2011 11:48:37
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
21:01:2011 16:51:09
15:04:2012 11:48:37

结果是：

21:01:2011 16:49:54
21:01:2011 16:51:09
26:01:2011 11:01:40
26:01:2011 11:02:55
05:04:2011 11:48:37
05:04:2011 11:48:37
05:04:2011 11:51:13
05:04:2011 11:51:13
15:04:2012 11:48:37

I had the same problem some time ago, and I solved transforming the format when I sorted the list in the same way as Jon Skeet has propossed, this is my piece of code:

my @source = <DATA>;
my @data =  sort {$a<=>$b} map { m!(\d+):(\d+):(\d+) (\d+):(\d+):(\d+)!; "$3$2$1$4$5$6";} @source;
foreach ( @data ) {
    s!(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})!$3:$2:$1 $4:$5:$6!;
    print $_, "\n";
}

__DATA__
05:04:2011 11:48:37
21:01:2011 16:49:54
26:01:2011 11:02:55
26:01:2011 11:01:40
05:04:2011 11:51:13
05:04:2011 11:51:13
05:04:2011 11:48:37
21:01:2011 16:51:09
15:04:2012 11:48:37

The result is:

21:01:2011 16:49:54
21:01:2011 16:51:09
26:01:2011 11:01:40
26:01:2011 11:02:55
05:04:2011 11:48:37
05:04:2011 11:48:37
05:04:2011 11:51:13
05:04:2011 11:51:13
15:04:2012 11:48:37

回复收藏 0 原文

望她远 2024-12-15 21:36:14

首先，了解您想要做什么。接下来，让它发挥作用。然后，如有必要，进行优化。

轻松比较时间戳的一种方法是将它们转换为距纪元的偏移量。您可以使用Time::Local。鉴于您没有获得任意值，而是获得明确定义的时间戳，您可以进行一些过早的优化并使用 _nocheck 版本的 timelocal 或 timegm< /代码>。

这是使用您提供的示例数据执行此操作的一种方法：

#!/usr/bin/env perl

use strict; use warnings;

use Time::Local 'timelocal';

my @data;

while (my $line = <DATA>) {
    last unless $line =~ /\S/;
    chomp $line;
    push @data, [ split ' ', $line ];
}

@data = sort compare_records_descending_time @data;

print join("\t", @$_), "\n" for @data;

sub compare_records_descending_time {
    return ts2time($b) <=> ts2time($a);
}

sub ts2time {
    my ($record) = @_;
    my $ts = "@{ $record }[-2, -1]";

    # timestamp is day:mon:year hr:min:sec
    # timelocal expects arguments in sec, min, hr, day, mon, year

    return timelocal(($ts =~ /([0-9]+)/g)[5, 4, 3, 0, 1, 2]);
}

__DATA__
124a WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192654  01:07:2011 16:13:55
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192655  01:07:2011 16:11:23
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Male(Unknown)   192656  01:07:2011 11:04:26
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184557  04:05:2011 17:34:27
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184557  04:05:2011 17:34:27
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174878  15:02:2011 09:24:31
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174970  15:02:2011 09:21:19
3074    3074    87(10)  87(10)  87      100.00  109     Female(Unknown) 174860  15:02:2011 09:16:32
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173382  09:02:2011 09:54:48
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173284  09:02:2011 09:51:02
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173382  09:02:2011 09:54:48
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173284  09:02:2011 09:51:02
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200946  25:08:2011 10:17:26
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200946  25:08:2011 10:17:26
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179502  23:03:2011 10:03:23
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179470  23:03:2011 10:02:30

First, understand what you are trying to do. Next, get it to work. Then, if necessary, optimize.

One way to easily compare the time stamps is to convert them to offsets from an epoch. You can use Time::Local. Given that you are not getting arbitrary values, but rather well defined timestamps, you could engage in a little premature optimization and use the _nocheck version of timelocal or timegm.

Here is one way to do it using the sample data you provided:

#!/usr/bin/env perl

use strict; use warnings;

use Time::Local 'timelocal';

my @data;

while (my $line = <DATA>) {
    last unless $line =~ /\S/;
    chomp $line;
    push @data, [ split ' ', $line ];
}

@data = sort compare_records_descending_time @data;

print join("\t", @$_), "\n" for @data;

sub compare_records_descending_time {
    return ts2time($b) <=> ts2time($a);
}

sub ts2time {
    my ($record) = @_;
    my $ts = "@{ $record }[-2, -1]";

    # timestamp is day:mon:year hr:min:sec
    # timelocal expects arguments in sec, min, hr, day, mon, year

    return timelocal(($ts =~ /([0-9]+)/g)[5, 4, 3, 0, 1, 2]);
}

__DATA__
124a WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192654  01:07:2011 16:13:55
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Unknown(Unknown)        192655  01:07:2011 16:11:23
WGA_PD7124a     WGA_PD7124a     95(2)   95(2)   95      100.00  193     Male(Unknown)   192656  01:07:2011 11:04:26
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184558  04:05:2011 17:35:52
WGA_PD6355b     WGA_PD6355b     96(1)   96(1)   96      100.00  388     Unknown(Unknown)        184557  04:05:2011 17:34:27
WGA_PD6355b     WGA_PD6355a     96(1)   66(31)  66      95.45   388     Unknown(Unknown)        184557  04:05:2011 17:34:27
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174878  15:02:2011 09:24:31
3074    3074    87(10)  87(10)  87      100.00  109     Unknown(Unknown)        174970  15:02:2011 09:21:19
3074    3074    87(10)  87(10)  87      100.00  109     Female(Unknown) 174860  15:02:2011 09:16:32
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173382  09:02:2011 09:54:48
3163    3163    90(7)   90(7)   90      100.00  176     Unknown(Unknown)        173284  09:02:2011 09:51:02
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173382  09:02:2011 09:54:48
CHP-212 CHP-212 94(3)   94(3)   94      100.00  269     Unknown(Unknown)        173284  09:02:2011 09:51:02
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2631        MGH_2631        90(8)   90(8)   90      100.00  211     Male(Unknown)   200946  25:08:2011 10:17:26
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200943  01:09:2011 10:48:18
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200944  25:08:2011 10:20:16
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Unknown(Unknown)        200945  25:08:2011 10:19:05
MGH_2101        MGH_2101        80(18)  80(18)  80      100.00  359     Male(Unknown)   200946  25:08:2011 10:17:26
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179502  23:03:2011 10:03:23
PD4294c PD4294c 95(2)   95(2)   95      100.00  221     Unknown(Unknown)        179470  23:03:2011 10:02:30

回复收藏 0 原文

~没有更多了~