如何递归浏览目录和子目录以及记录顶级目录信息
我有以下目录结构,名为python-pathlib-scan-directory
的目录
.
.
├── File_Extension_Review_20220704.ipynb
├── File_Extension_Review_SIMCARE_20220704.ipynb
├── Project1
│ ├── data_1.1.csv
│ ├── data_1.2.xlsx
│ ├── data_3.1.xlsx
│ └── info.txt
├── Project2
│ ├── data_2.1.csv
│ ├── data_2.2.xlsx
│ └── resources.docx
├── Project3
│ └── Info.txt
├── data_1.csv
├── data_2.csv
├── data_3.csv
├── output.csv
├── script_1.py
└── script_2.ipynb
3 directories, 16 files
我想在使用Collections Counter()
中计算文件类型(扩展)的频率(扩展)这是大熊猫DF,通过将结果传递为一个命令。
我有以下代码来完成
dir_to_scan = Path("/Python-Pathlib-Scan-Directory")
all_files = []
# iterate recursively using rglob()
for i in dir_to_scan.rglob('*.*'):
if i.is_file():
all_files.append(i.suffix)
# Count values and return key:value pair denoting ext. and count
data = collections.Counter(all_files)
data
df = pd.DataFrame.from_dict(data, orient='index').reset_index().rename(columns={"index":"Extension", 0:"Count"})
df
Output:
Extension Count
.csv 6
.ipynb 3
.py 1
.txt 2
.xlsx 3
.docx 1
我的问题,这是在目录级别汇总的,而我希望它在每个级别(root Directory,project1 subdirectory,project2 subdirectory等)进行总结,因此我可能会在DF中contect concat结果。有一个额外的列指定目录并显示计数,所以我什至可以通过path.parent
进行分组?
关于最佳方法的任何建议吗?
还要注意,我只想在给定目录中的文件加入文件时,而不仅仅是遍历所有文件并立即将所有文件串在一起。
I have the following directory structure, directory named Python-Pathlib-Scan-Directory
.
.
├── File_Extension_Review_20220704.ipynb
├── File_Extension_Review_SIMCARE_20220704.ipynb
├── Project1
│ ├── data_1.1.csv
│ ├── data_1.2.xlsx
│ ├── data_3.1.xlsx
│ └── info.txt
├── Project2
│ ├── data_2.1.csv
│ ├── data_2.2.xlsx
│ └── resources.docx
├── Project3
│ └── Info.txt
├── data_1.csv
├── data_2.csv
├── data_3.csv
├── output.csv
├── script_1.py
└── script_2.ipynb
3 directories, 16 files
I want to count the frequency of file types (extensions) within using Collections Counter()
and return this as a Pandas df by passing in the results as a Dict.
I have the following code that does this
dir_to_scan = Path("/Python-Pathlib-Scan-Directory")
all_files = []
# iterate recursively using rglob()
for i in dir_to_scan.rglob('*.*'):
if i.is_file():
all_files.append(i.suffix)
# Count values and return key:value pair denoting ext. and count
data = collections.Counter(all_files)
data
df = pd.DataFrame.from_dict(data, orient='index').reset_index().rename(columns={"index":"Extension", 0:"Count"})
df
Output:
Extension Count
.csv 6
.ipynb 3
.py 1
.txt 2
.xlsx 3
.docx 1
My issue is that this summarises at the directory level while I want it to summarise at each level (Root directory, Project1 subdirectory, Project2 subdirectory etc.) instead so I maybe concat results together in a df, have an extra column specifying directory and show counts so I may group by later even, use path.parent
perhaps?
Any suggestions on the best way to approach this?
Also mindful that I could want to use something similar when just concatenating files in given directories and not just walking through all and concatenating all files together at once.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用Python Standard Libray pathlib 模块和a 递归函数,这是一种方法:
因此,给定一个伪造目录,其中包含有和没有扩展的几个子目录和文件:
Using Python standard libray Pathlib module and a recursive function, here is one way to do it:
And so, given a fake directory which contains several sub-directories and files with and without extensions: