（在 Linux 中）计算目录中大量文件的最快/最简单的方法是什么？

发布于 2024-11-09 01:47:34 字数 447 浏览 9 评论 0原文

我有一些目录，其中包含大量文件。每次我尝试访问其中的文件列表时，我都无法做到这一点，或者出现明显的延迟。我试图在 Linux 上的命令行中使用 ls 命令，而我的托管提供商的 Web 界面也没有帮助。

问题是，当我只执行 ls 时，甚至开始显示某些内容都需要花费大量时间。因此，ls | wc -l 也无济于事。

经过一番研究，我想出了这段代码（在这个例子中，它计算了某个服务器上新电子邮件的数量）：

print sum([len(files) for (root, dirs, files) in walk('/home/myname/Maildir/new')])

上面的代码是用 Python 编写的。我使用Python的命令行工具，它运行得非常快（立即返回结果）。

我对以下问题的答案感兴趣：是否可以更快地计算目录（没有子目录）中的文件？最快的方法是什么？

原文

I had some directory, with large number of files. Every time I tried to access the list of files within it, I was not able to do that or there was significant delay. I was trying to use ls command within command-line on Linux and web interface from my hosting provider did not help also.

The problem is, that when I just do ls, it takes significant amount of time to even start displaying something. Thus, ls | wc -l would not help also.

After some research I came up with this code (in this example it counts number of new emails on some server):

print sum([len(files) for (root, dirs, files) in walk('/home/myname/Maildir/new')])

The above code is written in Python. I used Python's command-line tool and it worked pretty fast (returned result instantly).

I am interested in the answer to the following question: is it possible to count files in a directory (without subdirectories) faster? What is the fastest way to do that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小情绪 2024-11-16 01:47:34

ls 对每个文件执行 stat(2) 调用。其他工具，例如 find(1) 和 shell 通配符扩展，可能会避免此调用，而只执行 readdir。一种可能有效的 shell 命令组合是 find dir -maxdepth 1|wc -l，但它会很乐意列出目录本身，并错误地计算其中包含换行符的任何文件名。

在 Python 中，获取这些名称的直接方法是 os.listdir(directory) 。与 os.walk 和 os.path.walk 不同，它不需要递归、检查文件类型或进行进一步的 Python 函数调用。

附录： ls 似乎并不总是统计。至少在我的 GNU 系统上，当不需要进一步的信息（例如哪些名称是目录）时，它只能执行 getdents 调用。 getdents 是用于在 GNU/Linux 中实现 readdir 的底层系统调用。

添加 2： ls 输出结果之前出现延迟的原因之一是它要进行排序和制表。 ls -U1 可以避免这种情况。

回复收藏 0 原文

浮世清欢 2024-11-16 01:47:34

这在 Python 中应该相当快：

from os import listdir
from os.path import isfile, join
directory = '/home/myname/Maildir/new'
print sum(1 for entry in listdir(directory) if isfile(join(directory,entry)))

This should be pretty fast in Python:

from os import listdir
from os.path import isfile, join
directory = '/home/myname/Maildir/new'
print sum(1 for entry in listdir(directory) if isfile(join(directory,entry)))

回复收藏 0 原文

夏尔 2024-11-16 01:47:34

给定目录中的文件总数

find . -maxdepth 1 -type f | wc -l

find . -type f | wc -l

有关更多详细信息，请进入终端并执行 man find

Total number of files in the given directory

find . -maxdepth 1 -type f | wc -l

Total number of files in the given directory and all subdirectories under it

find . -type f | wc -l

For more details drop into a terminal and do man find

回复收藏 0 原文

倥絔 2024-11-16 01:47:34

我不确定速度，但如果你只想使用 shell 内置函数，这应该可以：

#!/bin/sh
COUNT=0;
for file in /path/to/directory/*
do
COUNT=$(($COUNT+1));
done
echo $COUNT

I'm not sure about speed, but if you want to just use shell builtins this should work:

#!/bin/sh
COUNT=0;
for file in /path/to/directory/*
do
COUNT=$(($COUNT+1));
done
echo $COUNT

回复收藏 0 原文

む无字情书 2024-11-16 01:47:34

我认为 ls 大部分时间都花在显示第一行之前，因为它必须对条目进行排序，因此 ls -U 应该更快地显示第一行（尽管它总体上可能并没有那么好）。

回复收藏 0 原文

书信已泛黄 2024-11-16 01:47:34

最快的方法是避免解释语言的所有开销并编写一些直接解决您的问题的代码。这样做很难以可移植的方式完成，但非常简单。目前我使用的是 OS X 机器，但将以下内容转换为 Linux 应该非常简单。（我选择忽略隐藏文件，只计算常规文件...根据需要进行修改或添加命令行开关以获得您想要的功能。）

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int
main( int argc, char **argv )
{
    DIR *d;
    struct dirent *f;
    int count = 0;
    char *path = argv[ 1 ];

    if( path == NULL ) {
        fprintf( stderr, "usage: %s path", argv[ 0 ]);
        exit( EXIT_FAILURE );
    }
    d = opendir( path );
    if( d == NULL ) { perror( path );exit( EXIT_FAILURE ); }
    while( ( f = readdir( d ) ) != NULL ) {
        if( f->d_name[ 0 ] != '.'  &&  f->d_type == DT_REG )
            count += 1;
    }
    printf( "%d\n", count );
    return EXIT_SUCCESS;
}

The fastest way would be to avoid all the overhead of interpreted languages and write some code that directly addresses your problem. Doing so is difficult to do in a portable way, but pretty straightforward. At the moment I'm on an OS X box, but converting the following to Linux should be extremely straightforward. (I opted to ignore hidden files and only count regular files...modify as necessary or add command line switches to get the functionality you want.)

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>

int
main( int argc, char **argv )
{
    DIR *d;
    struct dirent *f;
    int count = 0;
    char *path = argv[ 1 ];

    if( path == NULL ) {
        fprintf( stderr, "usage: %s path", argv[ 0 ]);
        exit( EXIT_FAILURE );
    }
    d = opendir( path );
    if( d == NULL ) { perror( path );exit( EXIT_FAILURE ); }
    while( ( f = readdir( d ) ) != NULL ) {
        if( f->d_name[ 0 ] != '.'  &&  f->d_type == DT_REG )
            count += 1;
    }
    printf( "%d\n", count );
    return EXIT_SUCCESS;
}

回复收藏 0 原文

困倦 2024-11-16 01:47:34

我的用例是一个 Linux SBC (Banana Pi)，用于对 FAT32 USB 记忆棒上的目录中的文件进行计数。
在 shell 中，

ls -U {dir} | wc -l

需要 6.4 秒，其中包含 32k 个文件（32k = FAT32 上的最大文件/目录）
从 python 中

t=time.time() ; print len(os.listdir(d)) ; print time.time()-t

只需要 0.874 秒（！）在 Python 中看不到任何其他东西比这更快。

My use case is a linux SBC (Banana Pi) counting files in a directory on a FAT32 USB stick.
In a shell, doing

ls -U {dir} | wc -l

takes 6.4secs with 32k files in there (32k = max files/dir on FAT32)
From python doing

t=time.time() ; print len(os.listdir(d)) ; print time.time()-t

takes only 0.874secs(!) Can't see anything else in Python being quicker than that.

回复收藏 0 原文

若有似无的小暗淡 2024-11-16 01:47:34

在 bash 中计算目录中文件的更短方法：

files=(*) ； echo ${#files[@]}

我在tmpfs中生成了10_000个空文件；在我的机器上运行 ls | 需要 0.03 秒来计算它们wc -l 只是稍微慢一些（我在之前和之间刷新了缓存以防万一）

回复收藏 0 原文

~没有更多了~

关于作者

千纸鹤

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

（在 Linux 中）计算目录中大量文件的最快/最简单的方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

（在 Linux 中）计算目录中大量文件的最快/最简单的方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。