(在 Linux 中)计算目录中大量文件的最快/最简单的方法是什么?

发布于 2024-11-09 01:47:34 字数 447 浏览 3 评论 0原文

我有一些目录,其中包含大量文件。每次我尝试访问其中的文件列表时,我都无法做到这一点,或者出现明显的延迟。我试图在 Linux 上的命令行中使用 ls 命令,而我的托管提供商的 Web 界面也没有帮助。

问题是,当我只执行 ls 时,甚至开始显示某些内容都需要花费大量时间。因此,ls | wc -l 也无济于事。

经过一番研究,我想出了这段代码(在这个例子中,它计算了某个服务器上新电子邮件的数量):

print sum([len(files) for (root, dirs, files) in walk('/home/myname/Maildir/new')])

上面的代码是用 Python 编写的。我使用Python的命令行工具,它运行得非常快(立即返回结果)。

我对以下问题的答案感兴趣:是否可以更快地计算目录(没有子目录)中的文件?最快的方法是什么?

I had some directory, with large number of files. Every time I tried to access the list of files within it, I was not able to do that or there was significant delay. I was trying to use ls command within command-line on Linux and web interface from my hosting provider did not help also.

The problem is, that when I just do ls, it takes significant amount of time to even start displaying something. Thus, ls | wc -l would not help also.

After some research I came up with this code (in this example it counts number of new emails on some server):

print sum([len(files) for (root, dirs, files) in walk('/home/myname/Maildir/new')])

The above code is written in Python. I used Python's command-line tool and it worked pretty fast (returned result instantly).

I am interested in the answer to the following question: is it possible to count files in a directory (without subdirectories) faster? What is the fastest way to do that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

小情绪 2024-11-16 01:47:34

ls 对每个文件执行 stat(2) 调用。其他工具,例如 find(1) 和 shell 通配符扩展,可能会避免此调用,而只执行 readdir。一种可能有效的 shell 命令组合是 find dir -maxdepth 1|wc -l,但它会很乐意列出目录本身,并错误地计算其中包含换行符的任何文件名。

在 Python 中,获取这些名称的直接方法是 os.listdir(directory) 。与 os.walk 和 os.path.walk 不同,它不需要递归、检查文件类型或进行进一步的 Python 函数调用。

附录: ls 似乎并不总是统计。至少在我的 GNU 系统上,当不需要进一步的信息(例如哪些名称是目录)时,它只能执行 getdents 调用。 getdents 是用于在 GNU/Linux 中实现 readdir 的底层系统调用。

添加 2: ls 输出结果之前出现延迟的原因之一是它要进行排序和制表。 ls -U1 可以避免这种情况。

ls does a stat(2) call for every file. Other tools, like find(1) and the shell wildcard expansion, may avoid this call and just do readdir. One shell command combination that might work is find dir -maxdepth 1|wc -l, but it will gladly list the directory itself and miscount any filename with a newline in it.

From Python, the straight forward way to get just these names is os.listdir(directory). Unlike os.walk and os.path.walk, it does not need to recurse, check file types, or make further Python function calls.

Addendum: It seems ls doesn't always stat. At least on my GNU system, it can do only a getdents call when further information (such as which names are directories) is not requested. getdents is the underlying system call used to implement readdir in GNU/Linux.

Addition 2: One reason for a delay before ls outputs results is that it sorts and tabulates. ls -U1 may avoid this.

浮世清欢 2024-11-16 01:47:34

这在 Python 中应该相当快:

from os import listdir
from os.path import isfile, join
directory = '/home/myname/Maildir/new'
print sum(1 for entry in listdir(directory) if isfile(join(directory,entry)))

This should be pretty fast in Python:

from os import listdir
from os.path import isfile, join
directory = '/home/myname/Maildir/new'
print sum(1 for entry in listdir(directory) if isfile(join(directory,entry)))
夏尔 2024-11-16 01:47:34

给定目录中的文件总数

find . -maxdepth 1 -type f | wc -l

给定目录及其下所有子目录中的文件总数

find . -type f | wc -l

有关更多详细信息,请进入终端并执行 man find

Total number of files in the given directory

find . -maxdepth 1 -type f | wc -l

Total number of files in the given directory and all subdirectories under it

find . -type f | wc -l

For more details drop into a terminal and do man find

倥絔 2024-11-16 01:47:34

我不确定速度,但如果你只想使用 shell 内置函数,这应该可以:

#!/bin/sh
COUNT=0;
for file in /path/to/directory/*
do
COUNT=$(($COUNT+1));
done
echo $COUNT

I'm not sure about speed, but if you want to just use shell builtins this should work:

#!/bin/sh
COUNT=0;
for file in /path/to/directory/*
do
COUNT=$(($COUNT+1));
done
echo $COUNT
む无字情书 2024-11-16 01:47:34

我认为 ls 大部分时间都花在显示第一行之前,因为它必须对条目进行排序,因此 ls -U 应该更快地显示第一行(尽管它总体上可能并没有那么好)。

I think ls is spending most of its time before displaying the first line because it has to sort the entries, so ls -U should display the first line much faster (though it may not be that much better in total).

书信已泛黄 2024-11-16 01:47:34

最快的方法是避免解释语言的所有开销并编写一些直接解决您的问题的代码。这样做很难以可移植的方式完成,但非常简单。目前我使用的是 OS X 机器,但将以下内容转换为 Linux 应该非常简单。 (我选择忽略隐藏文件,只计算常规文件...根据需要进行修改或添加命令行开关以获得您想要的功能。)

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int
main( int argc, char **argv )
{
    DIR *d;
    struct dirent *f;
    int count = 0;
    char *path = argv[ 1 ];

    if( path == NULL ) {
        fprintf( stderr, "usage: %s path", argv[ 0 ]);
        exit( EXIT_FAILURE );
    }
    d = opendir( path );
    if( d == NULL ) { perror( path );exit( EXIT_FAILURE ); }
    while( ( f = readdir( d ) ) != NULL ) {
        if( f->d_name[ 0 ] != '.'  &&  f->d_type == DT_REG )
            count += 1;
    }
    printf( "%d\n", count );
    return EXIT_SUCCESS;
}

The fastest way would be to avoid all the overhead of interpreted languages and write some code that directly addresses your problem. Doing so is difficult to do in a portable way, but pretty straightforward. At the moment I'm on an OS X box, but converting the following to Linux should be extremely straightforward. (I opted to ignore hidden files and only count regular files...modify as necessary or add command line switches to get the functionality you want.)

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int
main( int argc, char **argv )
{
    DIR *d;
    struct dirent *f;
    int count = 0;
    char *path = argv[ 1 ];

    if( path == NULL ) {
        fprintf( stderr, "usage: %s path", argv[ 0 ]);
        exit( EXIT_FAILURE );
    }
    d = opendir( path );
    if( d == NULL ) { perror( path );exit( EXIT_FAILURE ); }
    while( ( f = readdir( d ) ) != NULL ) {
        if( f->d_name[ 0 ] != '.'  &&  f->d_type == DT_REG )
            count += 1;
    }
    printf( "%d\n", count );
    return EXIT_SUCCESS;
}

困倦 2024-11-16 01:47:34

我的用例是一个 Linux SBC (Banana Pi),用于对 FAT32 USB 记忆棒上的目录中的文件进行计数。
在 shell 中,

ls -U {dir} | wc -l

需要 6.4 秒,其中包含 32k 个文件(32k = FAT32 上的最大文件/目录)
从 python 中

t=time.time() ; print len(os.listdir(d)) ; print time.time()-t

只需要 0.874 秒(!)在 Python 中看不到任何其他东西比这更快。

My use case is a linux SBC (Banana Pi) counting files in a directory on a FAT32 USB stick.
In a shell, doing

ls -U {dir} | wc -l

takes 6.4secs with 32k files in there (32k = max files/dir on FAT32)
From python doing

t=time.time() ; print len(os.listdir(d)) ; print time.time()-t

takes only 0.874secs(!) Can't see anything else in Python being quicker than that.

若有似无的小暗淡 2024-11-16 01:47:34

在 bash 中计算目录中文件的更短方法:

files=(*) ; echo ${#files[@]}

我在tmpfs中生成了10_000个空文件;在我的机器上运行 ls | 需要 0.03 秒来计算它们wc -l 只是稍微慢一些(我在之前和之间刷新了缓存以防万一)

A shorter way of counting files in a directory in bash:

files=(*) ; echo ${#files[@]}

I generate 10_000 empty files in tmpfs; it takes 0.03s on my machine to count them, running ls | wc -l was just slightly slower (I flushed the cache before and in between just in case)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文