Python:在非常大的文件夹中查找文件(超过100 TB)

发布于 2025-01-18 05:10:33 字数 2363 浏览 3 评论 0原文

我正在开发一个程序,该程序将编目软件(Rucio)中的条目与存储中的文件进行比较。从编目中,我得到了它认为文件存储位置的路径。然后,我在该位置搜索该文件,看看它是否存在。我已经成功创建了一个执行此操作的 bash 脚本,但如果可以在 python 中重做它会好得多。

我遇到的问题是 python 找不到这些文件,即使我知道它们存在。我已经尝试过类似的方法

if path.exists(fulladdress):
    does stuff

并提供一个我知道存在的文件,但仍然找不到它。我怀疑这与该文件夹很大,超过 100 TB 和超过 287000 个文件有关,因此它不会搜索整个文件夹,因此找不到该文件。

是否存在适用于这么大的文件夹的 python 解决方案?

此致 Piotr

有效的 bash 脚本是:

os.system("cd; cd directory_with_files; test -e file_in_directory _exist && echo filename >> found.txt || echo filename >> not_found "

尝试运行此:

def findfile(name, path):
    for dirpath, dirname, filename in os.walk(path):
        if name in filename:
            return os.path.join(dirpath, name)

def compere_checksum(not_missing_files):
    not_missing_files_file = open(not_missing_files, 'r')
    lines_not_missing_files_file = not_missing_files_file.readlines()

    #Extract a list of fiels i know exist
    for line in lines_not_missing_files_file:
        line.replace(' ','')
        line_list=line.split(",")
        address=line_list[0].replace("LUND: file://", "")
        #address= path to the folder 
        fille=address[address.rindex('/')+1:]
        #fille the mane of the file
        address=address.replace(fille,"")

        #search for the file using bash
        os.system("test -e {} && echo Found {}".format(line_list[0],fille))
        
        #search for the file using python function abovea
        filepath=findfile(address,fille)
        print(filepath)

地址类似于“/projects/dir/dir/dir/dir/dir/mc20/v12/4.0GeV/v2.2.1-3e/”,

而 fille 是看起来像这样“mc_v12-4GeV-3e-inclusive_run1310195_t1601591250.root”

脚本返回:

Found mc_v12-4GeV-3e-inclusive_run1310220_t1601591602.root
None
Found mc_v12-4GeV-3e-inclusive_run1310246_t1601592829.root
None
Found mc_v12-4GeV-3e-inclusive_run1310247_t1601591229.root
None
Found mc_v12-4GeV-3e-inclusive_run1310248_t1601591216.root
None
Found mc_v12-4GeV-3e-inclusive_run1310249_t1601591416.root
None
Found mc_v12-4GeV-3e-inclusive_run1310250_t1601591472.root
None

所以bash 脚本找到它,但 python 没有

I am working on a program that compares entries in cataloguing software (Rucio) with the files in storage. From the cataloguing, I get a path to what it believes the storage location for the file is. I then search that location for the file to see if it exists there or not. I have successfully created a bash script that performs this, but it would be a lot better if it could be redone in python.

The problem I have encountered is that python will not find the files, even when I know they exist there. I have tried stuff like

if path.exists(fulladdress):
    does stuff

And providing a file I know exists it still does not find it. I suspect it has to do with the fact that the folder is huge, over 100 TB and over 287000 files, so it does not search the whole folder and therefore does not find the file.

Does there exist a python solution that works for folders that big?

Best regards
Piotr

the bash script that works is:

os.system("cd; cd directory_with_files; test -e file_in_directory _exist && echo filename >> found.txt || echo filename >> not_found "

tried running this:

def findfile(name, path):
    for dirpath, dirname, filename in os.walk(path):
        if name in filename:
            return os.path.join(dirpath, name)

def compere_checksum(not_missing_files):
    not_missing_files_file = open(not_missing_files, 'r')
    lines_not_missing_files_file = not_missing_files_file.readlines()

    #Extract a list of fiels i know exist
    for line in lines_not_missing_files_file:
        line.replace(' ','')
        line_list=line.split(",")
        address=line_list[0].replace("LUND: file://", "")
        #address= path to the folder 
        fille=address[address.rindex('/')+1:]
        #fille the mane of the file
        address=address.replace(fille,"")

        #search for the file using bash
        os.system("test -e {} && echo Found {}".format(line_list[0],fille))
        
        #search for the file using python function abovea
        filepath=findfile(address,fille)
        print(filepath)

address is something along the lines of "/projects/dir/dir/dir/dir/dir/mc20/v12/4.0GeV/v2.2.1-3e/"

and fille is looks like this "mc_v12-4GeV-3e-inclusive_run1310195_t1601591250.root"

The script returns:

Found mc_v12-4GeV-3e-inclusive_run1310220_t1601591602.root
None
Found mc_v12-4GeV-3e-inclusive_run1310246_t1601592829.root
None
Found mc_v12-4GeV-3e-inclusive_run1310247_t1601591229.root
None
Found mc_v12-4GeV-3e-inclusive_run1310248_t1601591216.root
None
Found mc_v12-4GeV-3e-inclusive_run1310249_t1601591416.root
None
Found mc_v12-4GeV-3e-inclusive_run1310250_t1601591472.root
None

so the bash script finds it but the python does not

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

孤君无依 2025-01-25 05:10:33

我可以使用:

while open(file) as f:
     do stuff

不知道为什么这有效,

path.exists

或者不知道为什么,

def findfile(name, path):
    for dirpath, dirname, filename in os.walk(path):
        if name in filename:
            return os.path.join(dirpath, name) 

但无论如何,只要它有效就可以了。

I can use:

while open(file) as f:
     do stuff

Dont know why this works and not

path.exists

or

def findfile(name, path):
    for dirpath, dirname, filename in os.walk(path):
        if name in filename:
            return os.path.join(dirpath, name) 

but whatever, as long as it works it is fine.

oО清风挽发oО 2025-01-25 05:10:33
import os

def findfile(name, path):
    for dirpath, dirname, filename in os.walk(path):
        if name in filename:
            return os.path.join(dirpath, name)
filepath = findfile("file2.txt", "/")
print(filepath)
import os

def findfile(name, path):
    for dirpath, dirname, filename in os.walk(path):
        if name in filename:
            return os.path.join(dirpath, name)
filepath = findfile("file2.txt", "/")
print(filepath)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文