如何在 Perl 中检查具有两个不同扩展名的文件

发布于 2024-12-20 11:27:47 字数 2532 浏览 0 评论 0原文

我有一个文件 reflog ，其内容如下。将会有名称相同但扩展名不同的项目。我想检查每个项目（file1、file2 和 file3 此处为例），它需要存在于两个项目中扩展名（.abc 和 .def）。如果两个扩展都存在，它将执行一些正则表达式并打印出来。否则，它只会报告文件名和扩展名（即，如果仅存在 file1.abc 或 file1.def ，则会打印出来）。

reflog：

file1.abc


file2.abc

file2.def 

file3.abc
file3.def

file4.abc 

file5.abc 
file5.def
file6.def
file8abc.def
file7.abc

file1.def
file9abc.def
file10def.abc

我的脚本如下（从 yb007 脚本编辑），但我的输出存在一些问题，我不知道如何解决。我注意到当 reflog 文件包含任何名为 *abc.def 的文件（例如 file8abc.def 和 file9abc.def）时，输出将会错误。它将删除最后 4 个后缀并返回错误的 .ext（这里是 .abc，但我认为它应该是 .def）。

    #! /usr/bin/perl 
    use strict; 
    use warnings; 
    my @files_abc ;
    my @files_def ;
    my $line;
    open(FILE1, 'reflog') || die ("Could not open reflog") ;
    open (FILE2, '>log') || die ("Could not open log") ;
    while ($line = <FILE1>) {   
        if($line=~ /(.*).abc/) {       
            push(@files_abc,$1);   
        } elsif ($line=~ /(.*).def/) { 
            push(@files_def,$1);     } 
    } 
    close(FILE1);

    my %first = map { $_ => 1 } @files_def ;
    my @same = grep { $first{$_} } @files_abc ;
    my @abc_only = grep { !$first{$_} } @files_abc ;
    foreach my $abc (sort @abc_only) {
        $abc .= ".abc";
    }   

    my %second = map {$_=>1} @files_abc; 
    my @same2 = grep { $second{$_} } @files_def; #@same and same2 are equal.
    my @def_only = grep { !$second{$_} } @files_def;
    foreach my $def (sort @def_only) {
        $def .= ".def";
    }

    my @combine_all = sort (@same, @abc_only, @def_only);
    print "\nCombine all:-\n @combine_all\n" ;
    print "\nList of files with same extension\n @same";  
    print "\nList of files with abc only\n @abc_only"; 
    print "\nList of files with def only\n @def_only"; 
    foreach my $item (sort @combine_all) {
        print FILE2 "$item\n" ;
    }
    close (FILE2) ;

我的输出是这样的，这是错误的：- 第一：- 打印屏幕输出如下：结合所有：- file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc

List of files with same extension
 file1 file2 file3 file5
List of files with abc only
 file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
 file6.def

Log output as below:
    **file.abc
    file.abc**
    file1
    file10def.abc
    file2
    file3
    file4.abc
    file5
    file6.def
    file7.abc

你能帮我看看哪里错了吗？谢谢大家。

原文

I have a file reflog with the content as below. There will be items with same name but different extensions. I want to check that for each of the items (file1, file2 & file3 here as example), it needs to be exist in both extensions (.abc and .def). If both extensions exist, it will perform some regex and print out. Else it will just report out with the file name together with extension (ie, if only on of file1.abc or file1.def exists, it will be printed out).

reflog:

file1.abc


file2.abc

file2.def 

file3.abc
file3.def

file4.abc 

file5.abc 
file5.def
file6.def
file8abc.def
file7.abc

file1.def
file9abc.def
file10def.abc

My script is as below (editted from yb007 script), but I have some issues with the output that I don;t know how to resolve. I notice the output is going to be wrong when the reflog file having any file with the name *abc.def (such as ie. file8abc.def & file9abc.def). It will be trim down the last 4 suffix and return the wrong .ext (which is .abc here but I suppose it should be .def).

    #! /usr/bin/perl 
    use strict; 
    use warnings; 
    my @files_abc ;
    my @files_def ;
    my $line;
    open(FILE1, 'reflog') || die ("Could not open reflog") ;
    open (FILE2, '>log') || die ("Could not open log") ;
    while ($line = <FILE1>) {   
        if($line=~ /(.*).abc/) {       
            push(@files_abc,$1);   
        } elsif ($line=~ /(.*).def/) { 
            push(@files_def,$1);     } 
    } 
    close(FILE1);

    my %first = map { $_ => 1 } @files_def ;
    my @same = grep { $first{$_} } @files_abc ;
    my @abc_only = grep { !$first{$_} } @files_abc ;
    foreach my $abc (sort @abc_only) {
        $abc .= ".abc";
    }   

    my %second = map {$_=>1} @files_abc; 
    my @same2 = grep { $second{$_} } @files_def; #@same and same2 are equal.
    my @def_only = grep { !$second{$_} } @files_def;
    foreach my $def (sort @def_only) {
        $def .= ".def";
    }

    my @combine_all = sort (@same, @abc_only, @def_only);
    print "\nCombine all:-\n @combine_all\n" ;
    print "\nList of files with same extension\n @same";  
    print "\nList of files with abc only\n @abc_only"; 
    print "\nList of files with def only\n @def_only"; 
    foreach my $item (sort @combine_all) {
        print FILE2 "$item\n" ;
    }
    close (FILE2) ;

My output is like this which is wrong:-
1st:- print screen output as below:
Combine all:-
file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc

List of files with same extension
 file1 file2 file3 file5
List of files with abc only
 file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
 file6.def

Log output as below:
    **file.abc
    file.abc**
    file1
    file10def.abc
    file2
    file3
    file4.abc
    file5
    file6.def
    file7.abc

Can you pls help me take a look where gies wrong? Thanks heaps.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凤舞天涯 2024-12-27 11:27:47

始终添加

use strict;
use warnings;

到程序的头部。在您需要寻求帮助之前，他们会发现最简单的错误。

您应该始终使用 open FILE、"reflog" 或 die $!; 检查文件打开是否成功。
您使用的变量 $ine 不存在。您的意思是 $line
您读入数组的行包含尾随换行符。编写 chomp @lines; 来删除它们
您的正则表达式是错误的，您需要 || 而不是 &&。而是写 if ($line =~ /\.(iif|isp)$/)

如果这些问题修复后仍然存在问题，请再次询问。

ALWAYS add

use strict;
use warnings;

to the head of your program. They will catch most simple errors before you need to ask for help.

You should always check whether a file open succeeded with open FILE, "reflog" or die $!;
You are using a variable $ine that doesn't exist. You mean $line
The lines you read into the array contain a trailing newline. Write chomp @lines; to remove them
Your regular expressions are wrong and you need || instead of &&. Instead write if ($line =~ /\.(iif|isp)$/)

If you still have problems when these are fixed then please ask again.

回复收藏 0 原文

耀眼的星火 2024-12-27 11:27:47

除了已经指出的错误之外，您似乎是从 FUNC 而不是 FILE 加载 @lines。这也是一个错字吗？

另外，如果 reflog 确实包含一系列行，每行都有一个文件名，那么为什么您会期望条件“if ($line =~ /.abc/ && $line =~ /.def/)" 来评估 true？

如果您可以发布您正在读取的实际文件中的示例以及您正在调试的实际代码，那将会非常有帮助。或者至少编辑问题以修复已经提到的拼写错误

回复收藏 0 原文

初见 2024-12-27 11:27:47

use strict;
use warnings;

my @files_abc;
my @files_def;
my $line;

open(FILE,'reflog') || die ("could not open reflog");

while ($line = <FILE>) {
    if($line=~ /(.*)\.abc/) {
        push(@files_abc,$1);
    }
    elsif($line=~ /(.*)\.def/) {
        push(@files_def,$1);
    }
}

close(FILE);

my %second = map {$_=>1} @files_def;
my @same = grep { $second{$_} } @files_abc;

print "\nList of files with same extension\n @same";


foreach my $abc (@files_abc) {
           $abc .= ".abc";
         }
foreach my $def (@files_def) {
           $def .= ".def";
         }

print "\nList of files with abc extension\n @files_abc";
print "\nList of files with def extension\n @files_def";

输出是

List of files with same extension
file1 file2 file3 file5

List of files with abc extension
file1.abc file2.abc file3.abc file4.abc file5.abc file7.abc file10def.abc

List of files with def extension
file2.def file3.def file5.def file6.def file8abc.def file1.def file9abc.def

希望这有帮助...

use strict;
use warnings;

my @files_abc;
my @files_def;
my $line;

open(FILE,'reflog') || die ("could not open reflog");

while ($line = <FILE>) {
    if($line=~ /(.*)\.abc/) {
        push(@files_abc,$1);
    }
    elsif($line=~ /(.*)\.def/) {
        push(@files_def,$1);
    }
}

close(FILE);

my %second = map {$_=>1} @files_def;
my @same = grep { $second{$_} } @files_abc;

print "\nList of files with same extension\n @same";


foreach my $abc (@files_abc) {
           $abc .= ".abc";
         }
foreach my $def (@files_def) {
           $def .= ".def";
         }

print "\nList of files with abc extension\n @files_abc";
print "\nList of files with def extension\n @files_def";

Output is

List of files with same extension
file1 file2 file3 file5

List of files with abc extension
file1.abc file2.abc file3.abc file4.abc file5.abc file7.abc file10def.abc

List of files with def extension
file2.def file3.def file5.def file6.def file8abc.def file1.def file9abc.def

Hope this helps...

回复收藏 0 原文

你曾走过我的故事 2024-12-27 11:27:47

您不需要吞咽整个文件；你可以一次读一行。我认为这段代码适用于您的 reflog 文件的扩展版本：

xx.pl

#!/usr/bin/env perl

use strict;
use warnings;

open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log'    or die "Failed to create file log for writing ($!)";

my ($oldline, $oldname, $oldextn) = ("", "", "");
while (my $newline = <$file>)
{
    chomp $newline;
    $newline =~ s/^\s*//;
    my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
    if ($oldname eq $newname)
    {
        # Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
        print $func "$newname\n";
        print "$newname\n";
        $oldline = "";
        $oldname = "";
        $oldextn = "";
    }
    else
    {
        print $func "$oldline\n" if ($oldline);
        print "$oldline\n" if ($oldline);
        $oldline = $newline;
        $oldname = $newname;
        $oldextn = $newextn;
    }
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);

#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;

由于代码实际上并不检查扩展名，因此省略 $oldextn 和$newextn;另一方面，如果您非常担心输入格式需要处理前导空格，您可能很想检查扩展。

我很少发现像这样的处理脚本删除自己的输入有什么好处，因此我将 unlink "reflog"; 注释掉了；您的里程可能会有所不同。我也经常只从标准输入读取并写入标准输出；这会大大简化代码。此代码写入日志文件和标准输出；显然，您可以省略任一输出流。我懒得写一个函数来处理写入，所以 print 语句是成对出现的。

这是控制中断报告的一种变体。

reflog

file1.abc
file1.def
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file7.abc

输出

$ perl xx.pl
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$ cat log
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$

用空行处理未排序的文件名

#!/usr/bin/env perl

use strict;
use warnings;

open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log'    or die "Failed to create file log for writing ($!)";

my @lines;

while (<$file>)
{
    chomp;
    next if m/^\s*$/;
    push @lines, $_;
}

@lines = sort @lines;

my ($oldline, $oldname, $oldextn) = ("", "", "");
foreach my $newline (@lines)
{
    chomp $newline;
    $newline =~ s/^\s*//;
    my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
    if ($oldname eq $newname)
    {
        # Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
        print $func "$newname\n";
        print "$newname\n";
        $oldline = "";
        $oldname = "";
        $oldextn = "";
    }
    else
    {
        print $func "$oldline\n" if ($oldline);
        print "$oldline\n" if ($oldline);
        $oldline = $newline;
        $oldname = $newname;
        $oldextn = $newextn;
    }
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);

#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;

这与我发布的原始代码非常相似。新行如下：

my @lines;

while (<$file>)
{
    chomp;
    next if m/^\s*$/;
    push @lines, $_;
}

@lines = sort @lines;

my ($oldline, $oldname, $oldextn) = ("", "", "");    # Old
foreach my $newline (@lines)

读取“reflog”文件，跳过空白行，将其余行保存在 @lines 数组中。当所有行都被读取后，它们就被排序了。然后，新代码不是从文件中循环读取，而是从已排序的行数组中读取条目。其余处理如前。对于您所描述的输入文件，输出为：

file1
file2
file3

呃：不需要 chomp $newline; ，尽管它没有其他危害。老式的 chop（chomp 的前身）会很危险。现代 Perl 得一分。

You don't need to slurp the whole file; you can read one line at a time. I think this code works on this extended version of your reflog file:

xx.pl

#!/usr/bin/env perl

use strict;
use warnings;

open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log'    or die "Failed to create file log for writing ($!)";

my ($oldline, $oldname, $oldextn) = ("", "", "");
while (my $newline = <$file>)
{
    chomp $newline;
    $newline =~ s/^\s*//;
    my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
    if ($oldname eq $newname)
    {
        # Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
        print $func "$newname\n";
        print "$newname\n";
        $oldline = "";
        $oldname = "";
        $oldextn = "";
    }
    else
    {
        print $func "$oldline\n" if ($oldline);
        print "$oldline\n" if ($oldline);
        $oldline = $newline;
        $oldname = $newname;
        $oldextn = $newextn;
    }
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);

#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;

Since the code does not actually check the extensions, it would be feasible to omit $oldextn and $newextn; on the other hand, you might well want to check the extensions if you're sufficiently worried about the input format to need to deal with leading white space.

I very seldom find it good for a processing script like this to remove its own input, hence I've left unlink "reflog"; commented out; your mileage may vary. I would also often just read from standard input and write to standard output; that would simplify the code quite a bit. This code writes to both the log file and to standard output; obviously, you can omit either output stream. I was too lazy to write a function to handle the writing, so the print statements come in pairs.

This is a variant on control-break reporting.

reflog

file1.abc
file1.def
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file7.abc

Output

$ perl xx.pl
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$ cat log
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$

To handle unsorted file names with blank lines

#!/usr/bin/env perl

use strict;
use warnings;

open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log'    or die "Failed to create file log for writing ($!)";

my @lines;

while (<$file>)
{
    chomp;
    next if m/^\s*$/;
    push @lines, $_;
}

@lines = sort @lines;

my ($oldline, $oldname, $oldextn) = ("", "", "");
foreach my $newline (@lines)
{
    chomp $newline;
    $newline =~ s/^\s*//;
    my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
    if ($oldname eq $newname)
    {
        # Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
        print $func "$newname\n";
        print "$newname\n";
        $oldline = "";
        $oldname = "";
        $oldextn = "";
    }
    else
    {
        print $func "$oldline\n" if ($oldline);
        print "$oldline\n" if ($oldline);
        $oldline = $newline;
        $oldname = $newname;
        $oldextn = $newextn;
    }
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);

#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;

This is very similar to the original code I posted. The new lines are these:

my @lines;

while (<$file>)
{
    chomp;
    next if m/^\s*$/;
    push @lines, $_;
}

@lines = sort @lines;

my ($oldline, $oldname, $oldextn) = ("", "", "");    # Old
foreach my $newline (@lines)

This reads the 'reflog' file, skipping blank lines, saving the rest in the @lines array. When the lines are all read, they're sorted. Then, instead of a loop reading from the file, the new code reads entries from the sorted array of lines. The rest of the processing is as before. For your described input file, the output is:

file1
file2
file3

Urgh: the chomp $newline; is not needed, though it is not otherwise harmful. The old-fashioned chop (a precursor to chomp) would have been dangerous. Score one for modern Perl.

回复收藏 0 原文

九八野马 2024-12-27 11:27:47

open( FILE, "reflog" );
open( FUNC, '>log' );
my %seen;
while ( chomp( my $line = <FILE> ) ) {
    $line =~ s/^\s*//;
    if ( $ine =~ /(\.+)\.(abc|def)$/ ) {
        $seen{$1}++;
    }
}

foreach my $file ( keys %seen ) {
    if ( $seen{$file} > 1 ) {
        ## do whatever you want to
    }
}
unlink "reflog";
chmod( 0750, "log" );
close(FUNC);
close(FILE);

open( FILE, "reflog" );
open( FUNC, '>log' );
my %seen;
while ( chomp( my $line = <FILE> ) ) {
    $line =~ s/^\s*//;
    if ( $ine =~ /(\.+)\.(abc|def)$/ ) {
        $seen{$1}++;
    }
}

foreach my $file ( keys %seen ) {
    if ( $seen{$file} > 1 ) {
        ## do whatever you want to
    }
}
unlink "reflog";
chmod( 0750, "log" );
close(FUNC);
close(FILE);

回复收藏 0 原文

~没有更多了~