大量文件串联
我的目录中有大约 3-4 百万个文件,文件名以 type1.txt、type2.txt 结尾。
(文件为 1type1.txt、1type2.txt、2type2.txt、2type2 .txt
等)
现在我想连接所有以 type1.txt & 结尾的文件类型2.txt。
目前我正在做 cat *type1.txt > allTtype1.txt
与 type2.txt
类似。 我想保留两个最终输出文件中的顺序,我猜测 cat
会这样做。 但速度太慢了。
请建议一些更快的方法来执行相同的操作。
谢谢, 拉维
I have around 3-4 million files in a directory filename ending with, say type1.txt, type2.txt.
(file are 1type1.txt, 1type2.txt,2type2.txt,2type2.txt
etc )
Now I want to concatenate all files ending with type1.txt & type2.txt.
Currently I am doing cat *type1.txt > allTtype1.txt
similarly for type2.txt
.
I wanted to preserve order in both final output file, it is my guess that cat
does that.
But it is too slow.
Please suggest some faster method to do the same.
Thanks,
Ravi
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你可以使用这个命令来做到这一点:
但是正如snap在他的回答中所说的那样,每次
cat
需要打开一个文件时,它都必须进行inode查找,这将在目录中花费很长时间有很多文件。为了加快速度,您可以使用 icat 从 < a href="http://www.sleuthkit.org/sleuthkit/" rel="noreferrer">Sleuth Kit:更好的是,您可以将生成的文件放在另一个目录中:
You can do this using this command:
But as snap said above in his answer, each time
cat
need to open a file, it will have to do an inode lookup which would take a long time in a directory with lots of file. To try to speed things up, you could cat by inode using icat from the Sleuth Kit:And even better, you can put the resulting files in another directory:
cat
本身并不慢。但每次展开 shell 通配符(? 和 *)时,shell 都会读取并搜索该目录中的所有文件名,这非常慢。此外,当您按名称打开文件时,内核将花费一些时间来查找该文件,这是您无法避免的。这取决于所使用的文件系统(问题中未指定):某些文件系统比其他文件系统更智能地处理巨大的目录。
要解决这个问题,您可能会受益于获取文件列表一次:
...然后使用
grep
或类似的方法从该列表中选择文件:排序后如果出现这种情况,请确保以这样的方式构建您的存储/应用程序,以免再次发生这种情况。 :) 在取出文件后,还要确保
rmdir
到现有目录(即使其中只有一个文件,出于任何目的再次使用它也不会有效)。cat
itself is not slow. But every time you expand a shell wild card (? and *), the shell will read and search through all the file names in that directory, which is very slow.Also the kernel will take time finding the file when you open it by name, which you can not avoid. This depends on the file system in use (unspecified in the question): some file systems are more intelligent with huge directories than others.
To sort this out you might benefit from taking a file listing once:
...and then using
grep
or similar for selecting the files out of that list:After you have sorted this mess out, make sure to structure your storage/application in such a way that this does not ever happen again. :) Also make sure to to
rmdir
the existing directory after you have gotten your files out of it (using it again for any purpose will not be effective even if there is just a single file in it).