我们如何使用 perl 对第 1 列和第 2 列用户数据进行排序

发布于 2024-12-20 05:13:18 字数 649 浏览 0 评论 0原文

我是 Perl 编程新手。 我想读取文件数据,然后对第 1 列和第 2 列上的记录进行排序(删除重复记录)并将排序后的记录存储到另一个文件中。以下是我的数据

第一列和第二列由制表符分隔

 user1 name       user2 name

    abc               xyz
    adc               xyz
    abc               xyz
    pqr               tyu
    xyz               abc
    tyu               pqr
    abc               pqr

在此示例中,我希望首先对 user1 名称排序记录,然后对 user2 名称排序,并且在排序时我想删除重复的记录。

输出应如下所示,

user1 name        user2 name
  abc              pqr
  abc              xyz
  adc              xyz
  pqr              tyu
  tyu              pqr
  xyz              abc

请让我知道我们如何实现这个 perl?

I am new to perl programming.
I want read file data, then sort record on column 1 and then column2(remove repeated record) and stored sorted record into another file. following is my data

First column and second column is separated by tab

 user1 name       user2 name

    abc               xyz
    adc               xyz
    abc               xyz
    pqr               tyu
    xyz               abc
    tyu               pqr
    abc               pqr

In this example I want first sort record on user1 name and then user2 name and also at the time of sorting i want to remove repeated record.

Output should be as follow

user1 name        user2 name
  abc              pqr
  abc              xyz
  adc              xyz
  pqr              tyu
  tyu              pqr
  xyz              abc

please let me know how we can implement this perl?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

半边脸i 2024-12-27 05:13:18
#!/usr/bin/env perl
use strict;
use warnings;
my @list = <DATA>;
my $prev;
for (sort @list) {
    next if $prev && $_ eq $prev;
    $prev = $_;
    print;
}
__DATA__
    abc               xyz
    adc               xyz
    abc               xyz
    pqr               tyu
    xyz               abc
    tyu               pqr
    abc               pqr
#!/usr/bin/env perl
use strict;
use warnings;
my @list = <DATA>;
my $prev;
for (sort @list) {
    next if $prev && $_ eq $prev;
    $prev = $_;
    print;
}
__DATA__
    abc               xyz
    adc               xyz
    abc               xyz
    pqr               tyu
    xyz               abc
    tyu               pqr
    abc               pqr
秋日私语 2024-12-27 05:13:18

这完全取决于您如何存储数据。我不确定您打算如何存储您的信息,因为您在课堂上并且可能已经或可能没有了解参考资料。例如,如果您不知道引用,您可能会执行以下操作:

my @array;
foreach my $value (<INPUT>) {
   chomp $value;
   my ($user1, $user2) = split (" ", $value);
   push (@array, "$user1:$user2");
}

这会将两个值存储为单个字符串。如果不了解参考资料,这种情况很常见。

如果您了解引用,您可能会这样做:

my @array;
foreach my $value (<INPUT>) {
   chomp $value;
   my @line = split (" ", $value);
   push (@array, \@line);
}

我可以告诉您的是 sort< /a> 子例程允许您创建一个函数来比较和排序值。当您在 sort 中使用自己的函数时,您会得到两个值 $a$b ,它们代表您要排序的值。您可以操作它们,然后如果 $a 小于 $b1,则返回 -1如果 $a 大于 $b 或如果两者相等则返回零。 Perl 为您提供了两个运算符 <=>cmp 以使此操作变得更容易。

假设您将值存储为 $user1:$user2,因为您尚未了解引用。您的排序例程可能如下所示。

sub sort {
    my ($a_col1, $a_col2) = split (/:/, $a);
    my ($b_col1, $b_col2) = split (/:/, $b);

    # Now we compare $a to $b. First, we can compare the
    # User 1 column:

    if ($a_col1 lt $b_col1) {
        return -1;    #$a < $b
    }
    elsif ($a_col1 gt $b_col1) {
        return 1;     #$a > $b
    }

    # If we're down here, it's because column 1 matches
    # for both $a and $b. We'll have to compare column #2
    # to see which one is bigger.

    if ($a_col2 lt $b_col2) {
       return -1;   #$a < $b
    }
    elsif ($a_col2 gt $b_col2) {
       return 1;    #$a > $b
    }

    #We're down here because both column #1 and column #2 match for both
    #$a and $b. They must be equal

    return 0;
}

现在,我的排序将如下所示:

my @new_array = sort(\&sort, @array);

注意:这不是我个人的做法。我可能会使用内置的 cmp 运算符并采取一些快捷方式。不过,我想把它一点一点地拆开,这样你就可以理解了。

顺便说一句,如果老师决定您应该在第一列之前对第二列进行排序,您可以通过更改周围的小于和大于符号来轻松修改您的 sort 子例程。


这是我的测试程序:

#! /usr/bin/env perl

use strict;
use warnings;

#Putting my data in `@array`

my @array;
foreach my $entry (<DATA>) {
    chomp $entry;
    my ($user1, $user2) = split " ",  $entry;
    push @array, "$user1:$user2";
}

# Sorting my data

my @new_array = sort \&sort, @array;

#Now printing out my data nice and sorted...

foreach my $element (@new_array) {
    my ($user1, $user2) = split (/:/, $element);
    print "$user1\t\t$user2\n";
}

#
# END OF PROGRAM
##################################################

##################################################
# Sort subroutine I'm using to sort the data
#
sub sort {
    my ($a_col1, $a_col2) = split (/:/, $a);
    my ($b_col1, $b_col2) = split (/:/, $b);

    # Now we compare $a to $b. First, we can compare the
    # User 1 column:

    if ($a_col1 lt $b_col1) {
        return -1;    #$a < $b
    }
    elsif ($a_col1 gt $b_col1) {
        return 1;     #$a > $b
    }

    # If we're down here, it's because column 1 matches
    # for both $a and $b. We'll have to compare column #2
    # to see which one is bigger.

    if ($a_col2 lt $b_col2) {
        return -1;   #$a < $b

   }
    elsif ($a_col2 gt $b_col2) {
        return 1;    #$a > $b
    }

    #We're down here because both column #1 and column #2 match for both
    #$a and $b. They must be equal

    return 0;
}

__DATA__
david       fu
david       bar
albert      foofoo
sandy       barbar
albert      foobar

It all depends how you store your data. I'm not sure how you plan to store your information since you're in class and may or may not have learned about references. For example, if you don't know references, you might do something like this:

my @array;
foreach my $value (<INPUT>) {
   chomp $value;
   my ($user1, $user2) = split (" ", $value);
   push (@array, "$user1:$user2");
}

This will store both values as a single string. This is quite common if don't know about references.

If you know about references, you'd probably do this:

my @array;
foreach my $value (<INPUT>) {
   chomp $value;
   my @line = split (" ", $value);
   push (@array, \@line);
}

What I can tell you is that the sort subroutine allows you to create a function to compare and sort values. When you use your own function in sort, you get two values $a and $b which represent the values you're sorting. You can manipulate these, and then you return a -1 if $a is less than $b or 1 if $a is greater than $b or return a zero if they're both equal. Perl gives you two operators <=> and cmp to make this a bit easier.

Let's assume you're storing the values as $user1:$user2 since you haven't learned about references yet. Your sort routine might look like this.

sub sort {
    my ($a_col1, $a_col2) = split (/:/, $a);
    my ($b_col1, $b_col2) = split (/:/, $b);

    # Now we compare $a to $b. First, we can compare the
    # User 1 column:

    if ($a_col1 lt $b_col1) {
        return -1;    #$a < $b
    }
    elsif ($a_col1 gt $b_col1) {
        return 1;     #$a > $b
    }

    # If we're down here, it's because column 1 matches
    # for both $a and $b. We'll have to compare column #2
    # to see which one is bigger.

    if ($a_col2 lt $b_col2) {
       return -1;   #$a < $b
    }
    elsif ($a_col2 gt $b_col2) {
       return 1;    #$a > $b
    }

    #We're down here because both column #1 and column #2 match for both
    #$a and $b. They must be equal

    return 0;
}

Now, my sort will look something like this:

my @new_array = sort(\&sort, @array);

Note: This is not the way I'd personally do it. I'd probably use the built in cmp operator and take some shortcuts. However, I wanted to take this apart piece-by-piece, so you can understand it.

By the way, if the teacher decides you should sort the second column before the first, you can easily modify your sort subroutine by just changing the less than and greater than signs around.


Here's my test program:

#! /usr/bin/env perl

use strict;
use warnings;

#Putting my data in `@array`

my @array;
foreach my $entry (<DATA>) {
    chomp $entry;
    my ($user1, $user2) = split " ",  $entry;
    push @array, "$user1:$user2";
}

# Sorting my data

my @new_array = sort \&sort, @array;

#Now printing out my data nice and sorted...

foreach my $element (@new_array) {
    my ($user1, $user2) = split (/:/, $element);
    print "$user1\t\t$user2\n";
}

#
# END OF PROGRAM
##################################################

##################################################
# Sort subroutine I'm using to sort the data
#
sub sort {
    my ($a_col1, $a_col2) = split (/:/, $a);
    my ($b_col1, $b_col2) = split (/:/, $b);

    # Now we compare $a to $b. First, we can compare the
    # User 1 column:

    if ($a_col1 lt $b_col1) {
        return -1;    #$a < $b
    }
    elsif ($a_col1 gt $b_col1) {
        return 1;     #$a > $b
    }

    # If we're down here, it's because column 1 matches
    # for both $a and $b. We'll have to compare column #2
    # to see which one is bigger.

    if ($a_col2 lt $b_col2) {
        return -1;   #$a < $b

   }
    elsif ($a_col2 gt $b_col2) {
        return 1;    #$a > $b
    }

    #We're down here because both column #1 and column #2 match for both
    #$a and $b. They must be equal

    return 0;
}

__DATA__
david       fu
david       bar
albert      foofoo
sandy       barbar
albert      foobar
皓月长歌 2024-12-27 05:13:18

也许不值得生产代码,但这里有一个方法:

#!/usr/bin/perl

use strict;
use warnings;

my %seen;
print join "",
    grep {$_ !~ /^\s+$/ && !$seen{$_}++}
    sort {$a !~ /^ user/ <=> $b !~ /^ user/ || 
    $a cmp $b} <DATA>;

__DATA__
 user1 name       user2 name

    abc               xyz
    adc               xyz
    abc               xyz
    pqr               tyu
    xyz               abc
    tyu               pqr
    abc               pqr

输出:

 user1 name       user2 name
    abc               pqr   
    abc               xyz
    adc               xyz
    pqr               tyu
    tyu               pqr
    xyz               abc

这里最非常规的部分是 $a !~ /^ user/ <=>; $b !~ /^ user/ 排序条件。 $a !~ /^ user/ 对除第一行之外的所有行计算 1 (true),第一行的计算结果为 0 (false) ,因此标题被放在第一位,尾随行则进入第二个排序条件,从而产生所需的结果。

Maybe not production code worthy, but here's an approach:

#!/usr/bin/perl

use strict;
use warnings;

my %seen;
print join "",
    grep {$_ !~ /^\s+$/ && !$seen{$_}++}
    sort {$a !~ /^ user/ <=> $b !~ /^ user/ || 
    $a cmp $b} <DATA>;

__DATA__
 user1 name       user2 name

    abc               xyz
    adc               xyz
    abc               xyz
    pqr               tyu
    xyz               abc
    tyu               pqr
    abc               pqr

Output:

 user1 name       user2 name
    abc               pqr   
    abc               xyz
    adc               xyz
    pqr               tyu
    tyu               pqr
    xyz               abc

The most unconventional part here is the $a !~ /^ user/ <=> $b !~ /^ user/ sort condition. $a !~ /^ user/ evaluates 1 (true) for all lines except the first, where it will evaluate to 0 (false), so the header is put first, and trailing lines fall through to the second sort condition, which produces the desired result.

悟红尘 2024-12-27 05:13:18

或者它可以像这样简单:

print sort <DATA>;

__DATA__
    abc xyz
    pqr tyu
    xyz abc
    adc xyz
    tyu pqr
    abc pqr
    abc xyz

但前提是您的数据像这样简单。如果每列中的数据长度不同,
每列的宽度必须与最长的项目一样宽。像这样:

__DATA__
    abc              |xyz       |<-- other data in record...
    pqrwf            |tyu       |<-- other data in record...
    xyzsder          |abc       |<-- other data in record...
    adca             |xyzghrt   |<-- other data in record...
    tyuvdfcg         |pqr       |<-- other data in record...
    abcvfgfaqrt      |pqrbb     |<-- other data in record...
    abcaaaaaaaaaaa   |xyz       |<-- other data in record...

在这种情况下,简单排序仍然有效,但请注意,这些列是用空格而不是制表符填充的。

Or it could be as simple as:

print sort <DATA>;

__DATA__
    abc xyz
    pqr tyu
    xyz abc
    adc xyz
    tyu pqr
    abc pqr
    abc xyz

But only if your data is as simple as this. If the data in each column varies in length,
each column must be as wide as the longest item. like so:

__DATA__
    abc              |xyz       |<-- other data in record...
    pqrwf            |tyu       |<-- other data in record...
    xyzsder          |abc       |<-- other data in record...
    adca             |xyzghrt   |<-- other data in record...
    tyuvdfcg         |pqr       |<-- other data in record...
    abcvfgfaqrt      |pqrbb     |<-- other data in record...
    abcaaaaaaaaaaa   |xyz       |<-- other data in record...

In this case the simple sort still works but note that these columns a padded out with spaces not tabs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文