Node.js fs.readdir 递归目录搜索
关于使用 fs.readdir 进行异步目录搜索有什么想法吗?我意识到我们可以引入递归并使用下一个要读取的目录调用读取目录函数,但我有点担心它不是异步的......
有什么想法吗?我看过 node-walk 这很棒,但不只给我文件在数组中,就像 readdir 一样。虽然
正在寻找像这样的输出...
['file1.txt', 'file2.txt', 'dir/file3.txt']
Any ideas on an async directory search using fs.readdir? I realize that we could introduce recursion and call the read directory function with the next directory to read, but I'm a little worried about it not being async...
Any ideas? I've looked at node-walk which is great, but doesn't give me just the files in an array, like readdir does. Although
Looking for output like...
['file1.txt', 'file2.txt', 'dir/file3.txt']
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
基本上有两种方法可以实现这一点。在异步环境中,您会注意到有两种循环:串行循环和并行循环。串行循环等待一次迭代完成,然后再进入下一次迭代 - 这保证了循环的每次迭代都按顺序完成。在并行循环中,所有迭代同时开始,并且一个迭代可能在另一个迭代之前完成,但是,它比串行循环快得多。因此,在这种情况下,最好使用并行循环,因为步行完成的顺序并不重要,只要它完成并返回结果即可(除非您希望它们按顺序排列)。
并行循环将如下所示:
串行循环将如下所示:
并在您的主目录上对其进行测试(警告:如果您的主目录中有很多内容,结果列表将会很大):
编辑:改进例子。
There are basically two ways of accomplishing this. In an async environment you'll notice that there are two kinds of loops: serial and parallel. A serial loop waits for one iteration to complete before it moves onto the next iteration - this guarantees that every iteration of the loop completes in order. In a parallel loop, all the iterations are started at the same time, and one may complete before another, however, it is much faster than a serial loop. So in this case, it's probably better to use a parallel loop because it doesn't matter what order the walk completes in, just as long as it completes and returns the results (unless you want them in order).
A parallel loop would look like this:
A serial loop would look like this:
And to test it out on your home directory (WARNING: the results list will be huge if you have a lot of stuff in your home directory):
EDIT: Improved examples.
这一功能使用了 Node 8 中可用的最大数量的新流行功能,包括 Promises、util/promisify、解构、async-await、map+reduce 等,让您的同事在试图弄清楚什么时摸不着头脑。正在发生。
节点 8+
无外部依赖项。
使用
Node 10.10+
针对 Node 10+ 进行了更新,具有更多的功能:
请注意,从 Node 11.15.0 开始,您可以使用
files.flat()
而不是Array.prototype.concat(.. .files)
来展平文件数组。Node 11+
如果您想让每个人都大吃一惊,您可以使用以下带有异步迭代器的版本。除了非常酷之外,它还允许消费者一次提取一个结果,使其更适合大型目录。
用法已经改变,因为返回类型现在是异步迭代器而不是承诺
如果有人感兴趣,我在这里写了更多关于异步迭代器的内容:https://qwtel.com/posts/software/async-generators-in-the-wild/
节点 20+
从节点 20 开始,
fs.readdir
有一个 < code>{ recursive: true } 选项This one uses the maximum amount of new, buzzwordy features available in node 8, including Promises, util/promisify, destructuring, async-await, map+reduce and more, making your co-workers scratch their heads as they try to figure out what is going on.
Node 8+
No external dependencies.
Usage
Node 10.10+
Updated for node 10+ with even more whizbang:
Note that starting with node 11.15.0 you can use
files.flat()
instead ofArray.prototype.concat(...files)
to flatten the files array.Node 11+
If you want to blow everybody's head up completely, you can use the following version using async iterators. In addition to being really cool, it also allows consumers to pull results one-at-a-time, making it better suited for really large directories.
Usage has changed because the return type is now an async iterator instead of a promise
In case somebody is interested, I've written more about async iterators here: https://qwtel.com/posts/software/async-generators-in-the-wild/
Node 20+
As of Node 20,
fs.readdir
has a{ recursive: true }
option为了以防万一有人觉得它有用,我还整理了一个同步版本。
提示:过滤时使用更少的资源。在此函数本身内进行过滤。例如,将
results.push(file);
替换为以下代码。根据需要调整:Just in case anyone finds it useful, I also put together a synchronous version.
Tip: To use less resources when filtering. Filter within this function itself. E.g. Replace
results.push(file);
with below code. Adjust as required:A. 查看文件模块。它有一个名为 walk 的函数:
这可能适合你!是的,它是异步的。但是,我认为如果需要的话,您必须自己聚合完整路径。
B. 另一种选择,甚至是我最喜欢的选择之一:使用 unix
find
来实现。已经编程好的事情为什么还要再做一次呢?也许不完全是您所需要的,但仍然值得一试:Find 有一个很好的内置缓存机制,只要只有少数文件夹发生更改,就可以使后续搜索非常快。
A. Have a look at the file module. It has a function called walk:
This may be for you! And yes, it is async. However, I think you would have to aggregate the full path's yourself, if you needed them.
B. An alternative, and even one of my favourites: use the unix
find
for that. Why do something again, that has already been programmed? Maybe not exactly what you need, but still worth checking out:Find has a nice build-in caching mechanism that makes subsequent searches very fast, as long as only few folder have changed.
我建议使用 node-glob 来完成该任务。
I recommend using node-glob to accomplish that task.
另一个不错的 npm 包是 glob。
npm install glob
它非常强大,应该可以满足您所有的递归需求。
编辑:
实际上我对 glob 并不完全满意,所以我创建了 readdirp。
我非常有信心它的 API 使得递归查找文件和目录以及应用特定过滤器变得非常容易。
通读其文档以更好地了解其功能并安装通过:
npm install readdirp
Another nice npm package is glob.
npm install glob
It is very powerful and should cover all your recursing needs.
Edit:
I actually wasn't perfectly happy with glob, so I created readdirp.
I'm very confident that its API makes finding files and directories recursively and applying specific filters very easy.
Read through its documentation to get a better idea of what it does and install via:
npm install readdirp
简短、现代且高效:
特别感谢 Function 的提示:
{withFileTypes: true }
.这会自动保留源目录的树结构(您可能需要)。例如,如果:
那么
allFiles
将是一个 TREE,如下所示:Flat 如果需要的话:
Short, Modern and Efficient:
Special thank to Function for hinting:
{withFileTypes: true}
.This automatically keeps tree-structure of the source directory (which you may need). For example if:
then
allFiles
would be a TREE like this:Flat it if you want:
如果你想使用 npm 包,wrench 非常不错。
编辑(2018):
最近读过的人:作者在 2015 年弃用了这个包:
If you want to use an npm package, wrench is pretty good.
EDIT (2018):
Anyone reading through in recent time: The author deprecated this package in 2015:
异步
同步
异步可读
注意:两个版本都将遵循符号链接(与原始
fs.readdir 相同) )
Async
Sync
Async readable
Note: both versions will follow symlinks (same as the original
fs.readdir
)使用递归
调用
With Recursion
Calling
我喜欢 chjj 的答案 上面,如果没有这个启动,就无法创建我的并行循环版本。
我还创建了一个要点。欢迎评论。我仍在 NodeJS 领域起步,所以这是我希望了解更多信息的一种方式。
I loved the answer from chjj above and would not have been able to create my version of the parallel loop without that start.
I created a Gist as well. Comments welcome. I am still starting out in the NodeJS realm so that is one way I hope to learn more.
Vanilla ES6 + 异步/等待 + 小&可读
我没有在这个帖子中找到我正在寻找的答案;不同的答案中有一些相似的元素,但我只是想要一些简单易读的东西。
以防万一它对将来的任何人(即几个月后的我自己)有帮助,这就是我最终使用的:
Vanilla ES6 + async/await + small & readable
I didn't find the answer I was looking for in this thread; there were a few similar elements spread across different answers, but I just wanted something simple and readable.
Just in case it helps anyone in the future (i.e. myself in a couple of months), this I what I ended up using:
使用 node-dir 准确生成您喜欢的输出
Use node-dir to produce exactly the output you like
这是一个简单的同步递归解决方案
用法:
您可以异步编写它,但没有必要。只需确保输入目录存在并且可访问。
Here is a simple synchronous recursive solution
Usage:
You could write it asynchronously, but there is no need. Just make sure that the input directory exists and is accessible.
现代基于承诺的读取目录递归版本:
Modern promise based read dir recursive version:
qwtel的答案 变体,采用 TypeScript
qwtel's answer variant, in TypeScript
v20.1 发布:
fs.readdir
和fs.readdirSync
函数也支持recursive
选项。Shortest native solution which is available with v20.1 release:
recursive
option is also supported byfs.readdir
andfs.readdirSync
functions.基于简单、异步 Promise 的
用法:
await getDirRecursive("./public");
Simple, Async Promise Based
Usage:
await getDirRecursive("./public");
使用 async/await,这应该可以工作:
您可以使用 bluebird.Promisify 或这:
请参阅我的其他答案,了解可以更快给出结果的生成器方法。
Using async/await, this should work:
You can use bluebird.Promisify or this:
See my other answer for a generator approach that can give results even faster.
我最近编写了这个代码,并认为在这里分享这个是有意义的。该代码使用异步库。
你可以这样使用它:
I've coded this recently, and thought it would make sense to share this here. The code makes use of the async library.
You can use it like this:
名为 Filehound 的库是另一种选择。它将递归搜索给定目录(默认为工作目录)。它支持各种过滤器、回调、承诺和同步搜索。
例如,搜索当前工作目录中的所有文件(使用回调):
或者承诺并指定特定目录:
查阅文档以获取更多用例和使用示例:https://github.com/nspragg/filehound
免责声明:我是作者。
A library called Filehound is another option. It will recursively search a given directory (working directory by default). It supports various filters, callbacks, promises and sync searches.
For example, search the current working directory for all files (using callbacks):
Or promises and specifying a specific directory:
Consult the docs for further use cases and examples of usage: https://github.com/nspragg/filehound
Disclaimer: I'm the author.
查看 final-fs 库。它提供了一个 readdirRecursive 函数:
Check out the final-fs library. It provides a
readdirRecursive
function:独立的 Promise 实现
在本示例中,我使用 when.js Promise 库。
我添加了一个可选参数
includeDir
,如果设置为true
,它将包含文件列表中的目录。Standalone promise implementation
I am using the when.js promise library in this example.
I've included an optional parameter
includeDir
which will include directories in the file listing if set totrue
.recursive-readdir 模块具有此功能。
The recursive-readdir module has this functionality.
klaw 和 klaw-sync 对于这类事情值得考虑。这些是node-fs-extra的一部分。
klaw and klaw-sync are worth considering for this sort of thing. These were part of node-fs-extra.
对于 Node 10.3+,这里是 < a href="http://2ality.com/2017/12/for-await-of-sync-iterables.html" rel="nofollow noreferrer">for-await 解决方案:
该解决方案的好处是您可以立即开始处理结果;例如,读取媒体目录中的所有文件需要 12 秒,但如果我这样做,我可以在几毫秒内获得第一个结果。
For Node 10.3+, here is a for-await solution:
The benefit of this solution is that you can start processing the results immediately; e.g. it takes 12 seconds to read all the files in my media directory, but if I do it this way I can get the first result within a few milliseconds.
这是另一个实现。上述解决方案都没有任何限制,因此如果您的目录结构很大,它们都会崩溃并最终耗尽资源。
使用 50 的并发度效果很好,并且几乎与小型目录结构的简单实现一样快。
Here's yet another implementation. None of the above solutions have any limiters, and so if your directory structure is large, they're all going to thrash and eventually run out of resources.
Using a concurrency of 50 works pretty well, and is almost as fast as simpler implementations for small directory structures.
我修改了 Trevor Senior 的基于 Promise 的答案,以与 Bluebird 一起使用
I modified Trevor Senior's Promise based answer to work with Bluebird
为了好玩,这里是一个基于流的版本,可与 highland.js 流库配合使用。该书由维克多·武 (Victor Vu) 共同创作。
For fun, here is a flow based version that works with highland.js streams library. It was co-authored by Victor Vu.
使用 Promises (Q) 以函数式风格解决此问题:
它返回一个数组的 Promise,因此您可以将其用作:
Using Promises (Q) to solve this in a Functional style:
It returns a promise of an array, so you can use it as: