不使用递归遍历目录?
问题 我需要编写一个简单的软件,在给出某些约束的情况下,将一系列文件附加到列表中。 用户可以在两种“类型”的目录之间进行选择:一种带有 * 通配符,这意味着它还应该探索子目录;另一种是不带通配符的经典目录,仅获取该目录中存在的文件。
我在做什么
现在我正在做最愚蠢的事情:
import java.io.File;
public class Eseguibile {
private static void displayIt(File node){
System.out.println(node.getAbsoluteFile());
if(node.isDirectory()){
String[] subNote = node.list();
for(String filename : subNote){
displayIt(new File(node, filename));
}
}
}
public static void main(String[] args){
System.out.println("ciao");
displayIt( new File("/home/dierre/") );
}
}
我不需要构建一棵树,因为我只需要文件列表,所以我想也许有一种更有效的方法来做到这一点。
我正在阅读有关 TreeModel 的内容,但是据我了解,它是只是一个实现 Jtree 的接口。
The Problem
I need to write a simple software that, giving certain constraints, appends to a list a series of files.
The user could choose between two "types" of directory: one with a * wildcard meaning it should also explore subdirectories and the classic one without wildcards that just get files present in that directory.
What I'm doing
Right now I'm doing the stupidest thing:
import java.io.File;
public class Eseguibile {
private static void displayIt(File node){
System.out.println(node.getAbsoluteFile());
if(node.isDirectory()){
String[] subNote = node.list();
for(String filename : subNote){
displayIt(new File(node, filename));
}
}
}
public static void main(String[] args){
System.out.println("ciao");
displayIt( new File("/home/dierre/") );
}
}
I do not need to build a tree because I just need the files list so I was thinking maybe there's a more efficient way to do it.
I was reading about the TreeModel but, as I understand it, it's just an interface to implement a Jtree.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
递归既不“愚蠢”也不必然低效。事实上,在这种特殊情况下,递归解决方案可能比非递归解决方案更有效。当然,递归解决方案比其他解决方案更容易编码和理解。
递归的唯一潜在问题是,如果目录树太深,则可能会溢出堆栈。
如果您确实想避免递归,那么自然的方法是使用“文件列表堆栈”数据结构。在每个需要递归的地方,您都将包含当前目录的(剩余)File 对象的列表推入堆栈,读取新目录并开始处理它们。然后,完成后,弹出堆栈并继续父目录。这将为您提供深度优先遍历。如果您想要广度优先遍历,请使用“文件队列”数据结构而不是堆栈。
Recursion is neither "stupid" or necessarily inefficient. Indeed in this particular case, a recursive solution is likely to be more efficient than a non-recursive one. And of course, the recursive solution is easier to code and to understand than the alternatives.
The only potential problem with recursion is that you could overflow the stack if the directory tree is pathologically deep.
If you really want to avoid recursion, then the natural way to do it is to use a "stack of list of File" data structure. Each place where you would have recursed, you push the list containing the current directory's (remaining) File objects onto the stack, read the new directory and start working on them. Then when you are finished, pop the stack and continue with the parent directory. This will give you a depth-first traversal. If you want a breadth-first traversal, use a "queue of File" data structure instead of a stack.
我的迭代解决方案:
My iterative solution:
如果有帮助的话,这里有一个 C# 版本的迭代文件系统遍历:
@Stephen C:
下面是根据您的要求,我在评论中讨论的基准代码(C# - 不是 Java)。
请注意,它应该使用秒表而不是日期时间以获得更好的准确性,但否则就可以了。
我没有测试迭代是否提供与递归相同数量的文件,但它应该。
实际上,如果您注意中位数,您会发现这已经开始仅显示很少的文件。(我的桌面文件夹包含 2210 个文件,415 个文件夹,总共 3.2 GB,其中大部分下载文件夹、AppData 中存在较大文件,并且由于桌面上有一个较大的 C# 项目 [邮件服务器],因此文件数量较多)。
要获取我在评论中谈到的数字,请安装 cygwin (包含所有内容 [我认为约为 100GB]),并为 cygwin 文件夹建立索引。
正如评论中提到的,说没关系并不完全正确。
对于小型目录树,递归比迭代的效率可以忽略不计(大约几十毫秒),而对于非常大的树,递归比迭代慢几分钟(因此明显)。理解其中的原因也不难。如果您每次都必须分配并返回一组新的堆栈变量,调用一个函数,并存储所有以前的结果直到返回,那么您当然比在堆上启动一次堆栈结构要慢,并且每次迭代都使用它。
树不需要太深才能注意到这种影响(虽然速度慢不是堆栈溢出,但其非常负面的后果与 StackOverflow-Bug 没有太大区别)。另外,我不会将大量文件称为“病态”,因为如果您在主驱动器上创建索引,那么您自然会拥有大量文件。如果有一些 HTML 文档,文件数量就会激增。您会发现,在很多文件上,迭代在 30 秒内完成,而递归则需要 appx。 3分钟。
如果您需要保留遍历顺序,简化版本如下:
If it will help, here's a C# version of iterative file-system traversal:
@Stephen C:
Below as per your request, my benchmark code that I talked about in the comments (C# - not Java).
Note that it should use stopwatch instead of datetime for better accuracy, but otherwise it's fine.
I didn't test if the iteration delivers the same number files as the recursion, but it should.
Actually, if you pay attention to the median, you'll notice that this already starts showing with only very few files. (my Desktop folder contains 2210 files, 415 folders, 3.2 GB total, most of it large files in the download folder, AppData, and a higher number of files due to one larger C# project [mail-server] on my desktop).
To get the numbers I talked about in the comment, install cygwin (with everything [that's about 100GB, I think] ), and index the cygwin folder.
As mentioned in the comments, it is not entirely correct to say it doesn't matter.
While for a small directory tree, recursion is negligibly more efficient than iteration (in the order of several 10s of milliseconds), for a very large tree, recursion is minutes (and therefore noticeably) slower than iteration. It isn't all to difficult to grasp why either. If you have to allocate and return a new set of stack variables, each time, call a function, and store all previous results until you return, you're of course slower than when you initiate a a stack-structure on the heap once, and use that for each iteration.
The tree doesn't need to be pathologically deep for this effect to be noticed (while slow speed isn't a stack overflow, its very negative consequences are not much different from a StackOverflow-Bug). Also I wouldn't call having a lot of files "pathological", because if you do an index on your main drive, you'll naturally have a lot of files. Have some HTML documentation, and the number of files explodes. You'll find that on a whole lot of files, an iteration completes in less than 30 seconds, while a recursion needs appx. 3 minutes.
If you need to preserve the traversal order, a simplified version goes like this:
递归总是可以转化为循环。
一个快速而肮脏的可能解决方案(未经测试)如下:
请注意,源节点应该是要显示的任何内容的目录。
此外,这是一个广度优先的显示。如果您想要深度优先,则应该更改“追加”以将文件放在数组列表中当前节点之后。
不过,我不确定内存的情况。
问候
纪尧姆
Recursion can always be transformed into a loop.
A quick and dirty possible solution (not tested) follows :
please note that the source node should be a directory for anything to be displayed.
Also, this is a breadth-first display. if you want a depth first, you should change the "append" to put the file it just after the current node in the array list.
i'm not sure about the memory consomation, however.
Regards
Guillaume
如果您选择使用递归,我找到了一个可能与您当前使用的示例接近的示例,以消除任何歧义。
这是一个非常简单的示例,
process()
可以是您对目录进行处理或操作的地方。If you choose to use recursion, I found an example that may be close to the one you are currently using as to eliminate any ambiguity .
This is a very simple example, the
process()
can be where you do your handling or operations on the directory.我是一个真正的新手,但是在解决这个问题一周后...我有一个干净的解决方案...感谢 PATRY 和 etbal 的所有帮助。
I am a real novice, but after working for a week on this problem... I have a clean solution... thanks for all the help from PATRY and etbal.
党,谢谢你的建议。我稍微改变了你的代码,这就是我所得到的
PARTY, thank you for advice. I transformed a little bit your code, that's what I've got
基于PATRY Guillaume解决方案
Based on PATRY Guillaume solution