递归处理文件夹中文件的快速（低级）方法

发布于 2024-08-18 09:20:19 字数 1005 浏览 5 评论 0原文

我的应用程序索引最终用户计算机上所有硬盘驱动器的内容。我正在使用 Directory.GetFiles 和 Directory.GetDirectories 递归处理整个文件夹结构。我仅对少数选定的文件类型（最多 10 个文件类型）建立索引。

我在探查器中看到，大部分索引时间都花在枚举文件和文件夹上 - 取决于实际索引的文件比例（最多 90%）。

我想尽快建立索引。我已经优化了索引本身和索引文件的处理。

我正在考虑使用 Win32 API 调用，但实际上我在探查器中看到大部分处理时间实际上都花在了 .NET 完成的这些 API 调用上。

是否有一种可以从 C# 访问的（可能是低级的）方法，可以使文件/文件夹的枚举至少部分更快？

根据评论中的要求，我当前的代码（只是删除了不相关部分的方案）：

    private IEnumerable<IndexedEntity> RecurseFolder(string indexedFolder)
    {
        //for a single extension:
        string[] files = Directory.GetFiles(indexedFolder, extensionFilter);
        foreach (string file in files)
        {
            yield return ProcessFile(file);
        }
        foreach (string directory in Directory.GetDirectories(indexedFolder))
        {
            //recursively process all subdirectories
            foreach (var ie in RecurseFolder(directory))
            {
                yield return ie;
            }
        }
    }

原文

My application indexes contents of all hard drives on end users computers.
I am using Directory.GetFiles and Directory.GetDirectories to recursively process the whole folder structure. I am indexing only a few selected file types (up to 10 filetypes).

I am seeing in profiler that most of the indexing time is spent in enumerating files and folders - depending on ratio of files that will actually be indexed up to 90 percent of time.

I would like to make the indexing as fast as possible. I have already optimized the indexing itself and processing of the indexed files.

I was thinking using Win32 API calls, but I am actually seeing in the profiler that most of the processing time is actually spent on these API calls done by .NET.

Is there a (possibly low level) method accessible from C# that would make enumeration of files/folders at least partially faster?

As requested in the comment, my current code (just a scheme with irrelevant parts trimmed):

    private IEnumerable<IndexedEntity> RecurseFolder(string indexedFolder)
    {
        //for a single extension:
        string[] files = Directory.GetFiles(indexedFolder, extensionFilter);
        foreach (string file in files)
        {
            yield return ProcessFile(file);
        }
        foreach (string directory in Directory.GetDirectories(indexedFolder))
        {
            //recursively process all subdirectories
            foreach (var ie in RecurseFolder(directory))
            {
                yield return ie;
            }
        }
    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野味少女 2024-08-25 09:20:19

在.NET 4.0中，有内置的可枚举文件列表方法< /a>；因为这并不遥远，我会尝试使用它。这可能是一个因素，特别是如果您有大量填充的文件夹（需要大型阵列分配）。

如果深度是问题，我会考虑展平您的方法以使用本地堆栈/队列和单个迭代器块。这将减少用于枚举深层文件夹的代码路径：

    private static IEnumerable<string> WalkFiles(string path, string filter)
    {
        var pending = new Queue<string>();
        pending.Enqueue(path);
        string[] tmp;
        while (pending.Count > 0)
        {
            path = pending.Dequeue();
            tmp = Directory.GetFiles(path, filter);
            for(int i = 0 ; i < tmp.Length ; i++) {
                yield return tmp[i];
            }
            tmp = Directory.GetDirectories(path);
            for (int i = 0; i < tmp.Length; i++) {
                pending.Enqueue(tmp[i]);
            }
        }
    }

迭代该过程，从结果创建您的 ProcessFile。

In .NET 4.0, there are inbuilt enumerable file listing methods; since this isn't far away, I would try using that. This might be a factor in particular if you have any folders that are massively populated (requiring a large array allocation).

If depth is the issue, I would consider flattening your method to use a local stack/queue and a single iterator block. This will reduce the code path used to enumerate the deep folders:

    private static IEnumerable<string> WalkFiles(string path, string filter)
    {
        var pending = new Queue<string>();
        pending.Enqueue(path);
        string[] tmp;
        while (pending.Count > 0)
        {
            path = pending.Dequeue();
            tmp = Directory.GetFiles(path, filter);
            for(int i = 0 ; i < tmp.Length ; i++) {
                yield return tmp[i];
            }
            tmp = Directory.GetDirectories(path);
            for (int i = 0; i < tmp.Length; i++) {
                pending.Enqueue(tmp[i]);
            }
        }
    }

Iterate that, creating your ProcessFiles from the results.

回复收藏 0 原文