快速 ls 命令
我必须获得包含大约 200 万个文件的目录列表,但是当我对其执行 ls
命令时,什么也没有返回。 我已经等了3个小时了 我已经尝试过 ls | tee directory.txt,但这似乎永远挂起。
我假设服务器正在执行大量索引节点排序。 有没有什么方法可以加快 ls 命令的速度以获取文件名的目录列表? 我现在不关心大小、日期、许可等。
I've got to get a directory listing that contains about 2 million files, but when I do an ls
command on it nothing comes back. I've waited 3 hours. I've tried ls | tee directory.txt
, but that seems to hang forever.
I assume the server is doing a lot of inode sorting. Is there any way to speed up the ls
command to just get a directory listing of filenames? I don't care about size, dates, permission or the like at this time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(19)
这可能不是一个有用的答案,但如果您没有
find
,您也许可以使用tar
比我年长的人告诉我,“在过去,单用户和恢复环境比现在要有限得多。 这就是这个技巧的由来。
This is probably not a helpful answer, but if you don't have
find
you may be able to make do withtar
I am told by people older than me that, "back in the day", single-user and recovery environments were a lot more limited than they are nowadays. That's where this trick comes from.
我假设你正在使用 GNU ls?
尝试
它将取消通常的 ls (ls --color=auto) 的别名。
I'm assuming you are using GNU ls?
try
It will unalias the usual ls (ls --color=auto).
要尝试的事情:
检查 ls 是否有别名?
也许尝试查找?
希望这可以帮助。
Things to try:
Check ls isn't aliased?
Perhaps try find instead?
Hope this helps.
一些后续行动:
您没有提及您正在运行的操作系统,这将有助于表明您正在使用哪个版本的 ls 。 这可能不是一个“bash”问题,而是一个 ls 问题。 我的猜测是你正在使用 GNU ls,它有一些在某些情况下有用的功能,但在大目录上会杀掉你。
GNU ls 试图对列进行更漂亮的排列。 GNU ls 尝试对所有文件名进行智能排列。 在一个巨大的目录中,这将需要一些时间和内存。
要“修复”此问题,您可以尝试:
ls -1
# no columns at allfind BSD ls someplace, http://www.freebsd.org/cgi/cvsweb.cgi/src/bin/ls/ 并使用在你的大目录上。
使用其他工具,例如 find
Some followup:
You don't mention what OS you're running on, which would help indicate which version of ls you're using. This probably isn't a 'bash' question as much as an ls question. My guess is that you're using GNU ls, which has some features that are useful in some contexts, but kill you on big directories.
GNU ls Trying to have prettier arranging of columns. GNU ls tries to do a smart arrange of all the filenames. In a huge directory, this will take some time, and memory.
To 'fix' this, you can try:
ls -1
# no columns at allfind BSD ls someplace, http://www.freebsd.org/cgi/cvsweb.cgi/src/bin/ls/ and use that on your big directories.
Use other tools, such as find
有多种方法可以获取文件列表:
使用此命令获取不排序的列表:
或使用以下命令将文件列表发送到文件:
There are several ways to get a list of files:
Use this command to get a list without sorting:
or send the list of files to a file by using:
您使用什么分区类型?
如果一个目录中有数百万个小文件,那么使用 JFS 或 ReiserFS 可能是一个好主意,它们对于许多小文件具有更好的性能。
What partition type are you using?
Having millions of small files in one directory it might be a good idea to use JFS or ReiserFS which have better performance with many small sized files.
find ./ -type f
怎么样(它将查找当前目录中的所有文件)? 取消-type f
即可找到所有内容。How about
find ./ -type f
(which will find all files in the currently directory)? Take off the-type f
to find everything.您应该提供有关您正在使用的操作系统和文件系统类型的信息。 在某些风格的 UNIX 和某些文件系统上,您可能可以使用命令
ff
和ncheck
作为替代方案。You should provide information about what operating system and the type of filesystem you are using. On certain flavours of UNIX and certain filesystems you might be able to use the commands
ff
andncheck
as alternatives.我有一个文件名中带有时间戳的目录。 我想检查最新文件的日期并找到
find 。 -类型 f -最大深度 1 | 排序| tail -n 1
的速度大约是 ls -alh 的两倍。I had a directory with timestamps in the file names. I wanted to check the date of the latest file and found
find . -type f -maxdepth 1 | sort | tail -n 1
to be about twice as fast asls -alh
.这里还有很多其他好的解决方案,但为了完整性:
Lots of other good solutions here, but in the interest of completeness:
您还可以使用 xargs。 只需通过 xargs 传输 ls 的输出即可。
如果这不起作用并且上面的 find 示例不起作用,请尝试将它们通过管道传输到 xargs,因为它可以帮助减少可能导致问题的内存使用情况。
You can also make use of xargs. Just pipe the output of ls through xargs.
If that doesn't work and the find examples above aren't working, try piping them to xargs as it can help the memory usage that might be causing your problems.
如果进程“没有回来”,我建议 strace 来分析进程如何进程正在与操作系统交互。
如果是 ls:
您会看到它读取所有目录条目 (getdents(2)) 在它实际输出任何内容之前。 (排序……因为这里已经提到了)
If a process "doesn't come back", I recommend strace to analyze how a process is interacting with the operating system.
In case of ls:
you would have seen that it reads all directory entries (getdents(2)) before it actually outputs anything. (sorting… as it was already mentioned here)
使用速度
大约快 10 倍,而且很容易做到(我测试了 100 万个文件,但我最初的问题有 6 800 000 000 个文件),
但就我而言,我需要检查某个特定目录是否包含超过 10 000 个文件。 如果有超过 10 000 个文件,我不再对有多少文件感兴趣。 我只是退出程序,以便它运行得更快,并且不会尝试一一阅读其余部分。 如果少于 10 000,我会打印确切的数量。 如果您指定的参数值大于文件数量,我的程序的速度与 ls -1 -f 非常相似。
您可以通过键入以下内容在当前目录中使用我的程序 find_if_more.pl:
如果您只对是否有超过 n 个文件感兴趣,则对于大量文件,脚本将比 ls -1 -f 更快完成。
Using
is about 10 times faster and it is easy to do (I tested with 1 million files, but my original problem had 6 800 000 000 files)
But in my case I needed to check if some specific directory contains more than 10 000 files. If there were more than 10 000 files, I am not anymore interested that how many files there is. I just quit the program so that it will run faster and wont try to read the rest one-by-one. If there are less than 10 000, I will print the exact amount. Speed of my program is quite similar to ls -1 -f if you specify bigger value for parameter than amount of files.
You can use my program find_if_more.pl in current directory by typing:
If you are just interested if there are more than n files, script will finish faster than ls -1 -f with very large amount of files.
据我所知,这将是最快的选项:
ls -1 -f
。-1
(无列)-f
(无排序)This would be the fastest option AFAIK:
ls -1 -f
.-1
(No columns)-f
(No sorting)您可以重定向输出并在后台运行 ls 进程。
这将使您能够在业务运行的同时继续开展业务。 它不会锁定你的外壳。
不确定运行 ls 并获取更少数据的选项是什么。 您始终可以运行
man ls
进行检查。You can redirect output and run the ls process in the background.
This would allow you to go on about your business while its running. It wouldn't lock up your shell.
Not sure about what options are for running ls and getting less data back. You could always run
man ls
to check.尝试使用:
这只会列出目录中的文件,如果您想列出文件和目录,请省略
-type f
参数。Try using:
This will only list the files in the directory, leave out the
-type f
argument if you want to list files and directories.我有一个包含 400 万个文件的目录,让 ls 立即吐出文件而无需首先进行大量搅动的唯一方法是
I have a directory with 4 million files in it and the only way I got ls to spit out files immediately without a lot of churning first was
将执行 ls 而不排序。
缓慢的另一个来源是
--color
。 在某些 Linux 机器上,有一个方便的别名,它将--color=auto'
添加到 ls 调用中,使其查找找到的每个文件的文件属性(缓慢),以对显示进行着色。 这可以通过ls -U --color=never
或\ls -U
来避免。will do the ls without sorting.
Another source of slowness is
--color
. On some linux machines, there is a convenience alias which adds--color=auto'
to the ls call, making it look up file attributes for each file found (slow), to color the display. This can be avoided byls -U --color=never
or\ls -U
.这个问题似乎很有趣,我正在查看发布的多个答案。 为了了解发布的答案的效率,我在 200 万个文件上执行了它们,发现结果如下。
总结一下结果,
ls -f
命令的运行速度比ls -U
快一些。 禁用颜色可能会导致这种改进。find
命令以 2.738 秒的平均速度排名第三。ls
就花费了 42.16 秒。 在我的系统中,ls
是ls --color=auto
的别名,echo ./*
的 shell 扩展功能运行了 50.80 秒。tar
的解决方案花费了大约 37 分钟。所有测试均在系统空闲状态下单独进行。
这里需要注意的一件重要事情是文件列表不会打印在终端中,而是打印在终端中。
它们被重定向到一个文件,稍后使用 wc 命令计算文件计数。
如果输出打印在屏幕上,则命令运行速度太慢。
有什么想法为什么会发生这种情况吗?
This question seems to be interesting and I was going through multiple answers that were posted. To understand the efficiency of the answers posted, I have executed them on 2 million files and found the results as below.
To summarize the results
ls -f
command ran a bit faster thanls -U
. Disabling color might have caused this improvement.find
command ran third with an average speed of 2.738 seconds.ls
took 42.16 seconds. Here in my systemls
is an alias forls --color=auto
echo ./*
ran for 50.80 seconds.tar
based solution took about 37 miuntes.All tests were done seperately when system was in idle condition.
One important thing to note here is that the file lists are not printed in the terminal rather
they were redirected to a file and the file count was calculated later with
wc
command.Commands ran too slow if the outputs where printed on the screen.
Any ideas why this happens ?