搜索/索引大量文件

发布于 2024-11-18 23:25:12 字数 618 浏览 6 评论 0原文

我正在努力寻找一种有效的方法（< 0.5 sek）来在只有所需文件名的一小部分的巨大文件系统中搜索特定文件。

场景如下：

假设您有大约 15,000,000 个文件，所有这些文件均按其信息类型进行分类，这些文件包含在编号目录中，每个目录包含 20,000 个文件：

DATA
--TYPE_1_001
----ID_1234567_TYPE1.XML
----ID_2345678_TYPE1.XML
----[...]
--TYPE1_002
--[...]
--TYPE_1_097
--TYPE_2_001
----ID_1234567_TYPE2.JPG
----ID_2345678_TYPE2.JPG
----ID_2345679_TYPE2.JPG
----[...]
--[...]
--TYPE2_304
--[...]

等等。

因此，给定 ID（即 1234567），我试图找到包含该 id 的所有相关文件名。此“查找过程”将为另一个 XML 文件中给出的 7.000.000 个 id 中的每一个执行。

目前的流程需要 405 天才能处理所有 7,000,000 个 ID，据统计，这是不可接受的；）

有什么建议吗？

提前致谢！

原文

I'm struggeling to find an efficient way (< 0.5 sek) to search for specific files in a huge file system having only a little part of the desired file name.

Here's the scenario:

Consider you have about 15.000.000 files all categorised by their type of information contained an batched within numbered directories containing 20.000 files each:

DATA
--TYPE_1_001
----ID_1234567_TYPE1.XML
----ID_2345678_TYPE1.XML
----[...]
--TYPE1_002
--[...]
--TYPE_1_097
--TYPE_2_001
----ID_1234567_TYPE2.JPG
----ID_2345678_TYPE2.JPG
----ID_2345679_TYPE2.JPG
----[...]
--[...]
--TYPE2_304
--[...]

and so on.

So, given the ID (i.e. 1234567), I'm trying to find all relevant filenames containing said id.
This "find process" will be executed for each of the 7.000.000 ids given in another XML file.

The current process would take 405 days to process all 7.000.000 ids, which - who figures - is inacceptable ;)

Any suggestions?

Thanks in advance!

分享到QQ

分享到微博