从文件路径列表构建目录树

发布于 2024-08-25 12:19:53 字数 511 浏览 3 评论 0原文

我正在寻找一种省时的方法将文件列表解析为树。可能有数亿个文件路径。

强力解决方案是在出现目录分隔符时分割每个路径,并通过进行字符串比较来遍历树添加目录和文件条目,但这会非常慢。

输入数据通常按字母顺序排序,因此列表类似于:

C:\Users\Aaron\AppData\Amarok\Afile

C:\Users\Aaron\AppData\Amarok\Afile2

C:\Users\Aaron\AppData\Amarok\Afile3

C:\Users\Aaron\AppData\Blender\alibrary.dll

C:\Users\Aaron\AppData\Blender\and_so_on.txt

根据这个顺序,我的自然反应是将目录列表分区为组......以某种方式......在进行缓慢的字符串比较之前。我真的不确定。我将不胜感激任何想法。

编辑:如果可能的话,如果这棵树从上到下延迟加载,那就更好了。

I am looking for a time efficient method to parse a list of files into a tree. There can be hundreds of millions of file paths.

The brute force solution would be to split each path on occurrence of a directory separator, and traverse the tree adding in directory and file entries by doing string comparisons but this would be exceptionally slow.

The input data is usually sorted alphabetically, so the list would be something like:

C:\Users\Aaron\AppData\Amarok\Afile

C:\Users\Aaron\AppData\Amarok\Afile2

C:\Users\Aaron\AppData\Amarok\Afile3

C:\Users\Aaron\AppData\Blender\alibrary.dll

C:\Users\Aaron\AppData\Blender\and_so_on.txt

From this ordering my natural reaction is to partition the directory listings into groups... somehow... before doing the slow string comparisons. I'm really not sure. I would appreciate any ideas.

Edit: It would be better if this tree were lazy loaded from the top down if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

何以笙箫默 2024-09-01 12:19:53

您别无选择,只能进行完整的字符串比较,因为您无法保证字符串可能在哪里不同。有一些技巧可能会加快速度:

  • 正如 David 所说,形成一棵树,但从前一个树中搜索新的插入点(也许借助某种 matchingPrefix 例程这会告诉你新的不同之处)。
  • 如果树中可能有很多文件并且需要计算重复项,请对树的每个级别使用哈希表。 (否则,追加到堆栈就可以了。)

You have no choice but to do full string comparisons since you can't guarantee where the strings might differ. There are a couple tricks that might speed things up a little:

  • As David said, form a tree, but search for the new insertion point from the previous one (perhaps with the aid of some sort of matchingPrefix routine that will tell you where the new one differs).
  • Use a hash table for each level of the tree if there may be very many files within and you need to count duplicates. (Otherwise, appending to a stack is fine.)
庆幸我还是我 2024-09-01 12:19:53

如果可能的话,您可以使用 tree 命令生成树结构,这里

if its possible, you can generate your tree structure with the tree command, here

你好,陌生人 2024-09-01 12:19:53

要利用输入数据的“通常排序”属性,请从插入最后一个文件的目录开始遍历:将当前路径名的目录名与前一个路径名进行比较。如果它们匹配,您可以在此处插入,否则弹出一个级别并重试。

To take advantage of the "usually sorted" property of your input data, begin your traversal at the directory where your last file was inserted: compare the directory name of current pathname to the previous one. If they match, you can just insert here, otherwise pop up a level and try again.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文