Python计数目录中的文件及其所有子目录

发布于 2025-01-29 20:44:11 字数 1101 浏览 4 评论 0原文

我正在尝试计算文件夹及其所有子文件夹中的所有文件 对于典范,如果我的文件夹看起来像这样:

file1.txt
subfolder1/
├── file2.txt
├── subfolder2/
│   ├── file3.txt
│   ├── file4.txt
│   └── subfolder3/
│       └── file5.txt
└── file6.txt
file7.txt

我想获得数字7。

我尝试的第一件事是递归功能,它计算所有文件并以

def get_file_count(directory: str) -> int:

    count = 0

    for filename in os.listdir(directory):

        file = (os.path.join(directory, filename))

        if os.path.isfile(file):
            count += 1

        elif os.path.isdir(file):
            count += get_file_count(file)

    return count

这种方式为每个文件夹呼唤自己,但要花很多时间才能用于大目录。

我还记得这篇文章,它显示了一种使用win32com来计算文件夹总尺寸的快速方法图书馆还提供了一种方法来完成我想要的事情。 但是在搜索后,我只找到了这个,

fso = com.Dispatch("Scripting.FileSystemObject")
folder = fso.GetFolder(".")
size = folder.Files.Count

但这仅返回仅在目标文件夹中(而不是在其子文件夹中)中的文件数

,那么您是否知道Python中是否有最佳函数可以返回文件夹中的文件数量及其所有子文件夹?

I am trying to count all the files in a folder and all its subfolders
For exemple, if my folder looks like this:

file1.txt
subfolder1/
├── file2.txt
├── subfolder2/
│   ├── file3.txt
│   ├── file4.txt
│   └── subfolder3/
│       └── file5.txt
└── file6.txt
file7.txt

I would like get the number 7.

The first thing I tried is a recursive function who count all files and calls itself for each folder

def get_file_count(directory: str) -> int:

    count = 0

    for filename in os.listdir(directory):

        file = (os.path.join(directory, filename))

        if os.path.isfile(file):
            count += 1

        elif os.path.isdir(file):
            count += get_file_count(file)

    return count

This way works but takes a lot of time for big directories.

I also remembered this post, which shows a quick way to count the total size of a folder using win32com and I wondered if this librairy also offered a way to do what I was looking for.
But after searching, I only found this

fso = com.Dispatch("Scripting.FileSystemObject")
folder = fso.GetFolder(".")
size = folder.Files.Count

But this only returns the number of files in only the targeted folder (and not in its subfolders)

So, do you know if there is an optimal function in python that returns the number of files in a folder and all its subfolders?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

两仪 2025-02-05 20:44:11

iiuc,您可以做的

sum(len(files) for _, _, files in os.walk('path/to/folder'))

或也许是为了避免len以获得更好的性能:

sum(1 for _, _, files in os.walk('folder_test') for f in files)

IIUC, you can just do

sum(len(files) for _, _, files in os.walk('path/to/folder'))

or perhaps, to avoid the len for probably slightly better performance:

sum(1 for _, _, files in os.walk('folder_test') for f in files)
策马西风 2025-02-05 20:44:11

使用库OS路径的另一个解决方案:

from pathlib import Path
from os.path import isfile

len([x for x in Path('./dir1').rglob('*') if isfile(x)])

Another solution using the libraries os and Path:

from pathlib import Path
from os.path import isfile

len([x for x in Path('./dir1').rglob('*') if isfile(x)])
殊姿 2025-02-05 20:44:11

该代码将揭示指定根的所有目录条目(例如,普通文件,符号链接)的所有目录条目的计数。

包括时间安排和测试中使用的实际路径名:

from glob import glob, escape
import os
import time


def get_file_count(directory: str) -> int:
    count = 0
    for filename in glob(os.path.join(escape(directory), '*')):
        if os.path.isdir(filename):
            count += get_file_count(filename)
        else:
            count += 1
    return count

start = time.perf_counter()
count = get_file_count('/Volumes/G-DRIVE Thunderbolt 3')
end = time.perf_counter()

print(count)
print(f'{end-start:.2f}s')

输出:

166231
2.38s

This code will reveal a count of all directory entries that are not directories (e.g., plain files, symlinks) from a specified root.

Includes timing and an actual pathname used in the test:

from glob import glob, escape
import os
import time


def get_file_count(directory: str) -> int:
    count = 0
    for filename in glob(os.path.join(escape(directory), '*')):
        if os.path.isdir(filename):
            count += get_file_count(filename)
        else:
            count += 1
    return count

start = time.perf_counter()
count = get_file_count('/Volumes/G-DRIVE Thunderbolt 3')
end = time.perf_counter()

print(count)
print(f'{end-start:.2f}s')

Output:

166231
2.38s
余生共白头 2025-02-05 20:44:11

我使用了os.walk()

是我的样本,希望它对您有帮助

def file_dir():
    directories = []
    res = {}
    cwd = os.getcwd()
    for root, dirs, files in os.walk(cwd):
        for file in files:
            if file.endswith(".tsv"):
                directories.append(os.path.join(root, file))
    res['dir'] = directories
    return res

i used os.walk()

its my sample , i hope it'll helps you

def file_dir():
    directories = []
    res = {}
    cwd = os.getcwd()
    for root, dirs, files in os.walk(cwd):
        for file in files:
            if file.endswith(".tsv"):
                directories.append(os.path.join(root, file))
    res['dir'] = directories
    return res
影子是时光的心 2025-02-05 20:44:11

您也可以直接使用该命令:

find DIR_NAME -type f | wc -l

这返回所有文件的计数
使用OS.System()这可以从Python完成。

you could also directly use the command:

find DIR_NAME -type f | wc -l

this returns the count of all files
With os.system() this can be done from python.

陌伤浅笑 2025-02-05 20:44:11

正如其他人所指出的那样,正确的方法是使用os.walk,但要提供另一个解决方案,它尽可能类似于您的原始方法:

您可以使用 os.scandir 为了避免构建整个列表的成本,应该是更快的速度:

def get_file_count(directory: str) -> int:
    count = 0

    for entry in os.scandir(directory):
        if entry.is_file():
            count += 1

        elif entry.is_dir():
            count += get_file_count(os.path.join(directory, entry.name))

    return count

The proper way is to use os.walk as others have pointed out, but to give another solution which resembles your original as much as possible:

You can use os.scandir to avoid the cost of constructing the entire list, it should be substantially faster:

def get_file_count(directory: str) -> int:
    count = 0

    for entry in os.scandir(directory):
        if entry.is_file():
            count += 1

        elif entry.is_dir():
            count += get_file_count(os.path.join(directory, entry.name))

    return count
冬天的雪花 2025-02-05 20:44:11

这是另一种方式。

import os
import re
import pandas as pd
 
def count_files(top, pattern, list_files):
  top = os.path.abspath(os.path.expanduser(top))
  res = []
  for root, dirs, files in os.walk(top):
    name_space = os.path.relpath(root, top)
    level = os.path.normpath(name_space).count(os.sep) + 1 if name_space != '.' else 0
    matches = [file for file in files if re.search(pattern, file)]
    if matches:
      if list_files:
        res.append((pattern, level, name_space, len(matches), matches))
      else:
        res.append((pattern, level, name_space, len(matches)))

  if list_files:
    df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count', 'files'])
  else:
    df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count'])
  return df

考虑以下目录结构

rajulocal@hogwarts ~/x/x5 % tree -a 
.
├── analysis.txt
├── count_files.ipynb
├── d1
│   ├── d2
│   │   ├── d3
│   │   │   └── f5.txt
│   │   ├── f3.txt
│   │   └── f4.txt
│   ├── f2.txt
│   └── f6.txt
├── f1.txt
├── f7.txt
└── .ipynb_checkpoints
    └── count_files-checkpoint.ipynb

4 directories, 10 files

来计数每个目录中的文本文件(即以.txt结尾的文件),

rajulocal@hogwarts ~/x/x5 % ipython
Python 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.6.0 -- An enhanced Interactive Python. Type '?' for help.
...
In [2]: 
df = count_files("~/x/x5", "\.txt", False)
df
Out[2]: 
  pattern  level name_space  count
0   \.txt      0          .      3
1   \.txt      1         d1      2
2   \.txt      2      d1/d2      2
3   \.txt      3   d1/d2/d3      1

以查看这些文件是什么,

In [3]: 
df = count_files("~/x/x5", "\.txt", True)
df
Out[3]: 
  pattern  level name_space  count                           files
0   \.txt      0          .      3  [analysis.txt, f1.txt, f7.txt]
1   \.txt      1         d1      2                [f6.txt, f2.txt]
2   \.txt      2      d1/d2      2                [f4.txt, f3.txt]
3   \.txt      3   d1/d2/d3      1                        [f5.txt]

的总数

In [4]: 
df['count'].sum()
Out[4]: 
8

以获取以.ipynb(ipython Notebook文件)结尾的文件

In [5]: 
df = count_files("~/x/x5", "\.ipynb", True)
df
Out[5]: 
   pattern  level          name_space  count                           files
0  \.ipynb      0                   .      1             [count_files.ipynb]
1  \.ipynb      1  .ipynb_checkpoints      1  [count_files-checkpoint.ipynb]

In [6]: 
df['count'].sum()
Out[6]: 
2

,以计数所有 文件。

In [7]: 
df = count_files("~/x/x5", ".*", False)
df
Out[7]: 
  pattern  level          name_space  count
0      .*      0                   .      4
1      .*      1  .ipynb_checkpoints      1
2      .*      1                  d1      2
3      .*      2               d1/d2      2
4      .*      3            d1/d2/d3      1

In [8]: 
df['count'].sum()
Out[8]: 
10

与从树命令计数的文件计数匹配的

Here is another way.

import os
import re
import pandas as pd
 
def count_files(top, pattern, list_files):
  top = os.path.abspath(os.path.expanduser(top))
  res = []
  for root, dirs, files in os.walk(top):
    name_space = os.path.relpath(root, top)
    level = os.path.normpath(name_space).count(os.sep) + 1 if name_space != '.' else 0
    matches = [file for file in files if re.search(pattern, file)]
    if matches:
      if list_files:
        res.append((pattern, level, name_space, len(matches), matches))
      else:
        res.append((pattern, level, name_space, len(matches)))

  if list_files:
    df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count', 'files'])
  else:
    df = pd.DataFrame(res, columns=['pattern', 'level', 'name_space', 'count'])
  return df

Consider the following directory structure

rajulocal@hogwarts ~/x/x5 % tree -a 
.
├── analysis.txt
├── count_files.ipynb
├── d1
│   ├── d2
│   │   ├── d3
│   │   │   └── f5.txt
│   │   ├── f3.txt
│   │   └── f4.txt
│   ├── f2.txt
│   └── f6.txt
├── f1.txt
├── f7.txt
└── .ipynb_checkpoints
    └── count_files-checkpoint.ipynb

4 directories, 10 files

To count the text files (i.e. those ending in .txt) in each directory

rajulocal@hogwarts ~/x/x5 % ipython
Python 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.6.0 -- An enhanced Interactive Python. Type '?' for help.
...
In [2]: 
df = count_files("~/x/x5", "\.txt", False)
df
Out[2]: 
  pattern  level name_space  count
0   \.txt      0          .      3
1   \.txt      1         d1      2
2   \.txt      2      d1/d2      2
3   \.txt      3   d1/d2/d3      1

To see what those files are

In [3]: 
df = count_files("~/x/x5", "\.txt", True)
df
Out[3]: 
  pattern  level name_space  count                           files
0   \.txt      0          .      3  [analysis.txt, f1.txt, f7.txt]
1   \.txt      1         d1      2                [f6.txt, f2.txt]
2   \.txt      2      d1/d2      2                [f4.txt, f3.txt]
3   \.txt      3   d1/d2/d3      1                        [f5.txt]

To get the total number of files

In [4]: 
df['count'].sum()
Out[4]: 
8

To count files ending with .ipynb (ipython notebook files)

In [5]: 
df = count_files("~/x/x5", "\.ipynb", True)
df
Out[5]: 
   pattern  level          name_space  count                           files
0  \.ipynb      0                   .      1             [count_files.ipynb]
1  \.ipynb      1  .ipynb_checkpoints      1  [count_files-checkpoint.ipynb]

In [6]: 
df['count'].sum()
Out[6]: 
2

To count all the files

In [7]: 
df = count_files("~/x/x5", ".*", False)
df
Out[7]: 
  pattern  level          name_space  count
0      .*      0                   .      4
1      .*      1  .ipynb_checkpoints      1
2      .*      1                  d1      2
3      .*      2               d1/d2      2
4      .*      3            d1/d2/d3      1

In [8]: 
df['count'].sum()
Out[8]: 
10

which matches with the file count from the tree command.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文