如何在Perl中查找给定路径下不同目录中存在的文件
我正在寻找一种方法来查找驻留在给定路径中的几个目录中的文件。换句话说,这些目录将包含具有相同文件名的文件。我的脚本在查找 grep 文件名进行处理的正确路径时似乎存在层次结构问题。我有一个修复路径作为输入,脚本需要查看该路径并从那里查找文件,但我的脚本似乎停留在 2 层并从那里进行处理,而不是查看该层中的最后一个目录(在我的例子中)它处理“ln”和“nn”并开始处理子例程)。
修复输入路径是:-
/nfs/disks/version_2.0/
我想要通过子程序进行后处理的文件将存在于以下几个目录下。基本上我想检查 file1.abc
是否存在于所有目录 temp1、temp2 和 temp1 中。 ln目录下的temp3。如果 nn 目录下的 temp1、temp2、temp3 中存在 file2.abc
,则相同。
我想要检查完整路径的文件将如下所示:-
/nfs/disks/version_2.0/dir_a/ln/temp1/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp2/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp3/file1.abc
/nfs/disks/version_2.0/dir_a/nn/temp1/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp2/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp3/file2.abc
我的脚本如下:-
#! /usr/bin/perl -w
my $dir = '/nfs/fm/disks/version_2.0/' ;
opendir(TEMP, $dir) || die $! ;
foreach my $file (readdir(TEMP)) {
next if ($file eq "." || $file eq "..") ;
if (-d "$dir/$file") {
my $d = "$dir/$file";
print "Directory:- $d\n" ;
&getFile($d);
&compare($file) ;
}
}
请注意,我将 print "Directory:- $d\n" ;
放在那里用于调试目的,并且它打印了这个:-
/nfs/disks/version_2.0/dir_a/
/nfs/disks/version_2.0/dir_b/
所以我知道它进入了处理以下子例程的错误路径。
有人可以帮助指出我的脚本中的错误在哪里吗?谢谢!
I'm looking for a method to looks for file which resides in a few directories in a given path. In other words, those directories will be having files with same filename across. My script seem to have the hierarchy problem on looking into the correct path to grep the filename for processing. I have a fix path as input and the script will need to looks into the path and finding files from there but my script seem stuck on 2 tiers up and process from there rather than looking into the last directories in the tier (in my case here it process on "ln" and "nn" and start processing the subroutine).
The fix input path is:-
/nfs/disks/version_2.0/
The files that I want to do post processing by subroutine will be exist under several directories as below. Basically I wanted to check if the file1.abc
do exists in all the directories temp1, temp2 & temp3 under ln directory. Same for file2.abc
if exist in temp1, temp2, temp3 under nn directory.
The files that I wanted to check in full path will be like this:-
/nfs/disks/version_2.0/dir_a/ln/temp1/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp2/file1.abc
/nfs/disks/version_2.0/dir_a/ln/temp3/file1.abc
/nfs/disks/version_2.0/dir_a/nn/temp1/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp2/file2.abc
/nfs/disks/version_2.0/dir_a/nn/temp3/file2.abc
My script as below:-
#! /usr/bin/perl -w
my $dir = '/nfs/fm/disks/version_2.0/' ;
opendir(TEMP, $dir) || die $! ;
foreach my $file (readdir(TEMP)) {
next if ($file eq "." || $file eq "..") ;
if (-d "$dir/$file") {
my $d = "$dir/$file";
print "Directory:- $d\n" ;
&getFile($d);
&compare($file) ;
}
}
Note that I put the print "Directory:- $d\n" ;
there for debug purposes and it printed this:-
/nfs/disks/version_2.0/dir_a/
/nfs/disks/version_2.0/dir_b/
So I knew it get into the wrong path for processing the following subroutine.
Can somebody help to point me where is the error in my script? Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
需要明确的是:脚本应该递归通过目录并查找具有特定文件名的文件?在这种情况下,我认为以下代码是问题所在:
我假设
&getFile($d)
旨在进入目录(即递归步骤)。这很好。但是,看起来&compare($file)
是当您正在查看的对象不是目录时要执行的操作。因此,该代码块应该看起来像这样:一般的伪代码应该像这样:
作为旁注:这个脚本正在重新发明轮子。您只需要编写一个对单个文件进行处理的脚本。您可以完全从 shell 完成剩下的工作:
To be clear: the script is supposed to recurse through a directory and look for files with a particular filename? In this case, I think the following code is the problem:
I'm assuming the
&getFile($d)
is meant to step into a directory (i.e., the recursive step). This is fine. However, it looks like the&compare($file)
is the action that you want to take when the object that you're looking at isn't a directory. Therefore, that code block should look something like this:The general pseudo-code should like like this:
As a side note: this script is reinventing the wheel. You only need to write a script that does the processing on an individual file. You could do the rest entirely from the shell:
哇,这就像重温 20 世纪 90 年代! Perl 代码已经有所发展,您确实需要学习新东西。看来您是在 3.0 或 4.0 版本中学习 Perl 的。以下是一些提示:
use warnings;
而不是-w
。use strict;
。这将要求您使用my
预先声明变量,这会将它们的范围限定为本地块或文件(如果它们不在本地块中)。这有助于捕获很多错误。&
放在子例程名称前面。and
、or
和not
代替&&
、||
和!
。当有人说检测重复项时,我立即想到哈希值。如果您使用基于文件名的哈希值,您可以轻松查看是否存在重复文件。
当然,散列的每个键只能有一个值。幸运的是,在 Perl 5.x 中,该值可以是对另一个数据结构的引用。
因此,我建议您使用包含对列表(旧说法中的数组)的引用的哈希。您可以将文件的每个实例推送到该列表。
使用您的示例,您将拥有如下所示的数据结构:
并且,这是一个执行此操作的程序:
以下是有关
File::Find
的一些内容:$_
是文件的名称,我可以用它来查看这是一个文件还是目录$File::Find::Name
是全名文件的名称,包括路径。$File::Find::dir
是目录的名称。如果数组引用不存在,我将使用
$file_hash{$_} = [];
创建它。这不是必需的,但我发现它令人欣慰,并且可以防止错误。要将 $file_hash{$_} 用作数组,我必须取消引用它。我通过在其前面放置一个@
来实现这一点,因此它可以是@$file_hash{$_}
或@{$file_hash{$_} }
。一旦找到所有文件,我就可以打印出整个结构。我做的唯一一件事就是检查以确保每个数组中有多个成员。如果只有一个成员,则不会有重复项。
对恩典的回应
我只是在示例代码中打印文件,但您可以打开它们并处理它们,而不是打印文件。毕竟,您现在已经有了文件名和目录。这又是我的程序的核心。这次,我打开文件并查看内容:
如果您只想查找某些文件,则可以修改
wanted
函数以跳过您不需要的文件。例如,这里我只查找与file*.txt
模式匹配的文件。注意我使用正则表达式/^file.*\.txt$/
来匹配文件名。正如您所看到的,它与之前的wanted
子例程相同。唯一的区别是我的测试:我正在寻找一个文件(-f
)并且具有正确名称(file*.txt
)的东西:如果您是查看文件内容,您可以使用 MD5 哈希 来确定文件内容是否匹配或不匹配。这将文件减少为仅 16 到 28 个字符的字符串,甚至可以用作哈希键而不是文件名。这样,具有匹配 MD5 哈希值(从而匹配内容)的文件将位于同一哈希列表中。
您谈论了文件的“主列表”,并且您似乎认为该主列表需要与您正在查找的文件的内容相匹配。所以,我在我的程序中做了一个轻微的修改。我首先采用您谈到的主列表,并为每个文件生成 MD5 和。然后我将查看该目录中的所有文件,但只查看具有匹配 MD5 哈希值的文件...
顺便说一句,这尚未经过测试。
Wow, it's like reliving the 1990s! Perl code has evolved somewhat, and you really need to learn the new stuff. It looks like you learned Perl in version 3.0 or 4.0. Here's some pointers:
use warnings;
instead of-w
on the command line.use strict;
. This will require you to predeclare variables usingmy
which will scope them to the local block or the file if they're not in a local block. This helps catch a lot of errors.&
in front of subroutine names.and
,or
, andnot
instead of&&
,||
, and!
.When someone says detect duplicates, I immediately think of hashes. If you use a hash based upon your file's name, you can easily see if there are duplicate files.
Of course a hash can only have a single value for each key. Fortunately, in Perl 5.x, that value can be a reference to another data structure.
So, I recommend you use a hash that contains a reference to a list (array in old parlance). You can push each instance of the file to that list.
Using your example, you'd have a data structure that looks like this:
And, here's a program to do it:
Here's a few things about
File::Find
:wanted
.$_
is the name of the file, and I can use this to see if this is a file or directory$File::Find::Name
is the full name of the file including the path.$File::Find::dir
is the name of the directory.If the array reference doesn't exist, I create it with the
$file_hash{$_} = [];
. This isn't necessary, but I find it comforting, and it can prevent errors. To use$file_hash{$_}
as an array, I have to dereference it. I do that by putting a@
in front of it, so it can be@$file_hash{$_}
or,@{$file_hash{$_}}
.Once all the file are found, I can print out the entire structure. The only thing I do is check to make sure there is more than one member in each array. If there's only a single member, then there are no duplicates.
Response to Grace
I'm just printing the file in my sample code, but instead of printing the file, you could open them and process them. After all, you now have the file name and the directory. Here's the heart of my program again. This time, I'm opening the file and looking at the content:
If you are only looking for certain files, you could modify the
wanted
function to skip over files you don't want. For example, here I am only looking for files which match thefile*.txt
pattern. Note I use a regular expression of/^file.*\.txt$/
to match the name of the file. As you can see, it's the same as the previouswanted
subroutine. The only difference is my test: I'm looking for something that is a file (-f
) and has the correct name (file*.txt
):If you are looking at the file contents, you can use the MD5 hash to determine if the file contents match or don't match. This reduces a file to a mere string of 16 to 28 characters which could even be used as a hash key instead of the file name. This way, files that have matching MD5 hashes (and thus matching contents) would be in the same hash list.
You talk about a "master list" of files and it seems you have the idea that this master list needs to match the content of the file you're looking for. So, I'm making a slight mod in my program. I am first taking that master list you talked about, and generating MD5 sums for each file. Then I'll look at all the files in that directory, but only take the ones with the matching MD5 hash...
By the way, this has not been tested.