从文件路径列表构建目录树
我正在寻找一种省时的方法将文件列表解析为树。可能有数亿个文件路径。
强力解决方案是在出现目录分隔符时分割每个路径,并通过进行字符串比较来遍历树添加目录和文件条目,但这会非常慢。
输入数据通常按字母顺序排序,因此列表类似于:
C:\Users\Aaron\AppData\Amarok\Afile
C:\Users\Aaron\AppData\Amarok\Afile2
C:\Users\Aaron\AppData\Amarok\Afile3
C:\Users\Aaron\AppData\Blender\alibrary.dll
C:\Users\Aaron\AppData\Blender\and_so_on.txt
根据这个顺序,我的自然反应是将目录列表分区为组......以某种方式......在进行缓慢的字符串比较之前。我真的不确定。我将不胜感激任何想法。
编辑:如果可能的话,如果这棵树从上到下延迟加载,那就更好了。
I am looking for a time efficient method to parse a list of files into a tree. There can be hundreds of millions of file paths.
The brute force solution would be to split each path on occurrence of a directory separator, and traverse the tree adding in directory and file entries by doing string comparisons but this would be exceptionally slow.
The input data is usually sorted alphabetically, so the list would be something like:
C:\Users\Aaron\AppData\Amarok\Afile
C:\Users\Aaron\AppData\Amarok\Afile2
C:\Users\Aaron\AppData\Amarok\Afile3
C:\Users\Aaron\AppData\Blender\alibrary.dll
C:\Users\Aaron\AppData\Blender\and_so_on.txt
From this ordering my natural reaction is to partition the directory listings into groups... somehow... before doing the slow string comparisons. I'm really not sure. I would appreciate any ideas.
Edit: It would be better if this tree were lazy loaded from the top down if possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您别无选择,只能进行完整的字符串比较,因为您无法保证字符串可能在哪里不同。有一些技巧可能会加快速度:
matchingPrefix
例程这会告诉你新的不同之处)。You have no choice but to do full string comparisons since you can't guarantee where the strings might differ. There are a couple tricks that might speed things up a little:
matchingPrefix
routine that will tell you where the new one differs).如果可能的话,您可以使用
tree
命令生成树结构,这里if its possible, you can generate your tree structure with the
tree
command, here要利用输入数据的“通常排序”属性,请从插入最后一个文件的目录开始遍历:将当前路径名的目录名与前一个路径名进行比较。如果它们匹配,您可以在此处插入,否则弹出一个级别并重试。
To take advantage of the "usually sorted" property of your input data, begin your traversal at the directory where your last file was inserted: compare the directory name of current pathname to the previous one. If they match, you can just insert here, otherwise pop up a level and try again.