如何在 Perl 中检查具有两个不同扩展名的文件
我有一个文件 reflog
,其内容如下。将会有名称相同但扩展名不同的项目。我想检查每个项目(file1
、file2
和 file3
此处为例),它需要存在于两个项目中扩展名(.abc
和 .def
)。如果两个扩展都存在,它将执行一些正则表达式并打印出来。否则,它只会报告文件名和扩展名(即,如果仅存在 file1.abc 或 file1.def ,则会打印出来)。
reflog:
file1.abc
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file8abc.def
file7.abc
file1.def
file9abc.def
file10def.abc
我的脚本如下(从 yb007 脚本编辑),但我的输出存在一些问题,我不知道如何解决。我注意到当 reflog 文件包含任何名为 *abc.def 的文件(例如 file8abc.def 和 file9abc.def)时,输出将会错误。它将删除最后 4 个后缀并返回错误的 .ext(这里是 .abc,但我认为它应该是 .def)。
#! /usr/bin/perl
use strict;
use warnings;
my @files_abc ;
my @files_def ;
my $line;
open(FILE1, 'reflog') || die ("Could not open reflog") ;
open (FILE2, '>log') || die ("Could not open log") ;
while ($line = <FILE1>) {
if($line=~ /(.*).abc/) {
push(@files_abc,$1);
} elsif ($line=~ /(.*).def/) {
push(@files_def,$1); }
}
close(FILE1);
my %first = map { $_ => 1 } @files_def ;
my @same = grep { $first{$_} } @files_abc ;
my @abc_only = grep { !$first{$_} } @files_abc ;
foreach my $abc (sort @abc_only) {
$abc .= ".abc";
}
my %second = map {$_=>1} @files_abc;
my @same2 = grep { $second{$_} } @files_def; #@same and same2 are equal.
my @def_only = grep { !$second{$_} } @files_def;
foreach my $def (sort @def_only) {
$def .= ".def";
}
my @combine_all = sort (@same, @abc_only, @def_only);
print "\nCombine all:-\n @combine_all\n" ;
print "\nList of files with same extension\n @same";
print "\nList of files with abc only\n @abc_only";
print "\nList of files with def only\n @def_only";
foreach my $item (sort @combine_all) {
print FILE2 "$item\n" ;
}
close (FILE2) ;
我的输出是这样的,这是错误的:- 第一:- 打印屏幕输出如下: 结合所有:- file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc
List of files with same extension
file1 file2 file3 file5
List of files with abc only
file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
file6.def
Log output as below:
**file.abc
file.abc**
file1
file10def.abc
file2
file3
file4.abc
file5
file6.def
file7.abc
你能帮我看看哪里错了吗?谢谢大家。
I have a file reflog
with the content as below. There will be items with same name but different extensions. I want to check that for each of the items (file1
, file2
& file3
here as example), it needs to be exist in both extensions (.abc
and .def
). If both extensions exist, it will perform some regex and print out. Else it will just report out with the file name together with extension (ie, if only on of file1.abc or file1.def exists, it will be printed out).
reflog:
file1.abc
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file8abc.def
file7.abc
file1.def
file9abc.def
file10def.abc
My script is as below (editted from yb007 script), but I have some issues with the output that I don;t know how to resolve. I notice the output is going to be wrong when the reflog file having any file with the name *abc.def (such as ie. file8abc.def & file9abc.def). It will be trim down the last 4 suffix and return the wrong .ext (which is .abc here but I suppose it should be .def).
#! /usr/bin/perl
use strict;
use warnings;
my @files_abc ;
my @files_def ;
my $line;
open(FILE1, 'reflog') || die ("Could not open reflog") ;
open (FILE2, '>log') || die ("Could not open log") ;
while ($line = <FILE1>) {
if($line=~ /(.*).abc/) {
push(@files_abc,$1);
} elsif ($line=~ /(.*).def/) {
push(@files_def,$1); }
}
close(FILE1);
my %first = map { $_ => 1 } @files_def ;
my @same = grep { $first{$_} } @files_abc ;
my @abc_only = grep { !$first{$_} } @files_abc ;
foreach my $abc (sort @abc_only) {
$abc .= ".abc";
}
my %second = map {$_=>1} @files_abc;
my @same2 = grep { $second{$_} } @files_def; #@same and same2 are equal.
my @def_only = grep { !$second{$_} } @files_def;
foreach my $def (sort @def_only) {
$def .= ".def";
}
my @combine_all = sort (@same, @abc_only, @def_only);
print "\nCombine all:-\n @combine_all\n" ;
print "\nList of files with same extension\n @same";
print "\nList of files with abc only\n @abc_only";
print "\nList of files with def only\n @def_only";
foreach my $item (sort @combine_all) {
print FILE2 "$item\n" ;
}
close (FILE2) ;
My output is like this which is wrong:-
1st:- print screen output as below:
Combine all:-
file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc
List of files with same extension
file1 file2 file3 file5
List of files with abc only
file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
file6.def
Log output as below:
**file.abc
file.abc**
file1
file10def.abc
file2
file3
file4.abc
file5
file6.def
file7.abc
Can you pls help me take a look where gies wrong? Thanks heaps.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
始终添加
到程序的头部。在您需要寻求帮助之前,他们会发现最简单的错误。
open FILE、"reflog" 或 die $!;
检查文件打开是否成功。$ine
不存在。您的意思是$line
chomp @lines;
来删除它们||
而不是&&
。而是写if ($line =~ /\.(iif|isp)$/)
如果这些问题修复后仍然存在问题,请再次询问。
ALWAYS add
to the head of your program. They will catch most simple errors before you need to ask for help.
open FILE, "reflog" or die $!;
$ine
that doesn't exist. You mean$line
chomp @lines;
to remove them||
instead of&&
. Instead writeif ($line =~ /\.(iif|isp)$/)
If you still have problems when these are fixed then please ask again.
除了已经指出的错误之外,您似乎是从 FUNC 而不是 FILE 加载 @lines。这也是一个错字吗?
另外,如果 reflog 确实包含一系列行,每行都有一个文件名,那么为什么您会期望条件“if ($line =~ /.abc/ && $line =~ /.def/)" 来评估 true?
如果您可以发布您正在读取的实际文件中的示例以及您正在调试的实际代码,那将会非常有帮助。或者至少编辑问题以修复已经提到的拼写错误
Aside from the errors already pointed out, you appear to be loading @lines from FUNC instead of FILE. Is that also a typo?
Also, If reflog truly contains a series of lines with one filename on each line, why would you ever expect the conditional "if ($line =~ /.abc/ && $line =~ /.def/)" to evaluate true?
It would really help if you could post an example from the actual file you are reading from, along with the actual code you are debugging. Or at least edit the question to fix the typos already mentioned
输出是
希望这有帮助...
Output is
Hope this helps...
您不需要吞咽整个文件;你可以一次读一行。我认为这段代码适用于您的
reflog
文件的扩展版本:xx.pl
由于代码实际上并不检查扩展名,因此省略
$oldextn
和$newextn
;另一方面,如果您非常担心输入格式需要处理前导空格,您可能很想检查扩展。我很少发现像这样的处理脚本删除自己的输入有什么好处,因此我将
unlink "reflog";
注释掉了;您的里程可能会有所不同。我也经常只从标准输入读取并写入标准输出;这会大大简化代码。此代码写入日志文件和标准输出;显然,您可以省略任一输出流。我懒得写一个函数来处理写入,所以print
语句是成对出现的。这是控制中断报告的一种变体。
reflog
输出
用空行处理未排序的文件名
这与我发布的原始代码非常相似。新行如下:
读取“reflog”文件,跳过空白行,将其余行保存在
@lines
数组中。当所有行都被读取后,它们就被排序了。然后,新代码不是从文件中循环读取,而是从已排序的行数组中读取条目。其余处理如前。对于您所描述的输入文件,输出为:呃:不需要 chomp $newline; ,尽管它没有其他危害。老式的
chop
(chomp
的前身)会很危险。现代 Perl 得一分。You don't need to slurp the whole file; you can read one line at a time. I think this code works on this extended version of your
reflog
file:xx.pl
Since the code does not actually check the extensions, it would be feasible to omit
$oldextn
and$newextn
; on the other hand, you might well want to check the extensions if you're sufficiently worried about the input format to need to deal with leading white space.I very seldom find it good for a processing script like this to remove its own input, hence I've left
unlink "reflog";
commented out; your mileage may vary. I would also often just read from standard input and write to standard output; that would simplify the code quite a bit. This code writes to both the log file and to standard output; obviously, you can omit either output stream. I was too lazy to write a function to handle the writing, so theprint
statements come in pairs.This is a variant on control-break reporting.
reflog
Output
To handle unsorted file names with blank lines
This is very similar to the original code I posted. The new lines are these:
This reads the 'reflog' file, skipping blank lines, saving the rest in the
@lines
array. When the lines are all read, they're sorted. Then, instead of a loop reading from the file, the new code reads entries from the sorted array of lines. The rest of the processing is as before. For your described input file, the output is:Urgh: the
chomp $newline;
is not needed, though it is not otherwise harmful. The old-fashionedchop
(a precursor tochomp
) would have been dangerous. Score one for modern Perl.