Python:在非常大的文件夹中查找文件(超过100 TB)
我正在开发一个程序,该程序将编目软件(Rucio)中的条目与存储中的文件进行比较。从编目中,我得到了它认为文件存储位置的路径。然后,我在该位置搜索该文件,看看它是否存在。我已经成功创建了一个执行此操作的 bash 脚本,但如果可以在 python 中重做它会好得多。
我遇到的问题是 python 找不到这些文件,即使我知道它们存在。我已经尝试过类似的方法
if path.exists(fulladdress):
does stuff
并提供一个我知道存在的文件,但仍然找不到它。我怀疑这与该文件夹很大,超过 100 TB 和超过 287000 个文件有关,因此它不会搜索整个文件夹,因此找不到该文件。
是否存在适用于这么大的文件夹的 python 解决方案?
此致 Piotr
有效的 bash 脚本是:
os.system("cd; cd directory_with_files; test -e file_in_directory _exist && echo filename >> found.txt || echo filename >> not_found "
尝试运行此:
def findfile(name, path):
for dirpath, dirname, filename in os.walk(path):
if name in filename:
return os.path.join(dirpath, name)
def compere_checksum(not_missing_files):
not_missing_files_file = open(not_missing_files, 'r')
lines_not_missing_files_file = not_missing_files_file.readlines()
#Extract a list of fiels i know exist
for line in lines_not_missing_files_file:
line.replace(' ','')
line_list=line.split(",")
address=line_list[0].replace("LUND: file://", "")
#address= path to the folder
fille=address[address.rindex('/')+1:]
#fille the mane of the file
address=address.replace(fille,"")
#search for the file using bash
os.system("test -e {} && echo Found {}".format(line_list[0],fille))
#search for the file using python function abovea
filepath=findfile(address,fille)
print(filepath)
地址类似于“/projects/dir/dir/dir/dir/dir/mc20/v12/4.0GeV/v2.2.1-3e/”,
而 fille 是看起来像这样“mc_v12-4GeV-3e-inclusive_run1310195_t1601591250.root”
脚本返回:
Found mc_v12-4GeV-3e-inclusive_run1310220_t1601591602.root
None
Found mc_v12-4GeV-3e-inclusive_run1310246_t1601592829.root
None
Found mc_v12-4GeV-3e-inclusive_run1310247_t1601591229.root
None
Found mc_v12-4GeV-3e-inclusive_run1310248_t1601591216.root
None
Found mc_v12-4GeV-3e-inclusive_run1310249_t1601591416.root
None
Found mc_v12-4GeV-3e-inclusive_run1310250_t1601591472.root
None
所以bash 脚本找到它,但 python 没有
I am working on a program that compares entries in cataloguing software (Rucio) with the files in storage. From the cataloguing, I get a path to what it believes the storage location for the file is. I then search that location for the file to see if it exists there or not. I have successfully created a bash script that performs this, but it would be a lot better if it could be redone in python.
The problem I have encountered is that python will not find the files, even when I know they exist there. I have tried stuff like
if path.exists(fulladdress):
does stuff
And providing a file I know exists it still does not find it. I suspect it has to do with the fact that the folder is huge, over 100 TB and over 287000 files, so it does not search the whole folder and therefore does not find the file.
Does there exist a python solution that works for folders that big?
Best regards
Piotr
the bash script that works is:
os.system("cd; cd directory_with_files; test -e file_in_directory _exist && echo filename >> found.txt || echo filename >> not_found "
tried running this:
def findfile(name, path):
for dirpath, dirname, filename in os.walk(path):
if name in filename:
return os.path.join(dirpath, name)
def compere_checksum(not_missing_files):
not_missing_files_file = open(not_missing_files, 'r')
lines_not_missing_files_file = not_missing_files_file.readlines()
#Extract a list of fiels i know exist
for line in lines_not_missing_files_file:
line.replace(' ','')
line_list=line.split(",")
address=line_list[0].replace("LUND: file://", "")
#address= path to the folder
fille=address[address.rindex('/')+1:]
#fille the mane of the file
address=address.replace(fille,"")
#search for the file using bash
os.system("test -e {} && echo Found {}".format(line_list[0],fille))
#search for the file using python function abovea
filepath=findfile(address,fille)
print(filepath)
address is something along the lines of "/projects/dir/dir/dir/dir/dir/mc20/v12/4.0GeV/v2.2.1-3e/"
and fille is looks like this "mc_v12-4GeV-3e-inclusive_run1310195_t1601591250.root"
The script returns:
Found mc_v12-4GeV-3e-inclusive_run1310220_t1601591602.root
None
Found mc_v12-4GeV-3e-inclusive_run1310246_t1601592829.root
None
Found mc_v12-4GeV-3e-inclusive_run1310247_t1601591229.root
None
Found mc_v12-4GeV-3e-inclusive_run1310248_t1601591216.root
None
Found mc_v12-4GeV-3e-inclusive_run1310249_t1601591416.root
None
Found mc_v12-4GeV-3e-inclusive_run1310250_t1601591472.root
None
so the bash script finds it but the python does not
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我可以使用:
不知道为什么这有效,
或者不知道为什么,
但无论如何,只要它有效就可以了。
I can use:
Dont know why this works and not
or
but whatever, as long as it works it is fine.